diff --git a/AGENTS.md b/AGENTS.md index 7236b04..e8783a5 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -3,86 +3,72 @@ ### Architecture - -* **F3b: chunk terminator + lossy-row migration**: F3b chunk terminator + distillation metrics (schemas v11/v12): \`temporal.partsToText\` joins parts with \`"\n" + CHUNK\_TERMINATOR\` (\`\x1f\`, ASCII Unit Separator); FTS5 unicode61 treats it as separator so BM25 stays byte-identical. v11 migration rewrites legacy rows in-place via nested \`replace()\` on \`\n\[tool:\` and \`\n\[reasoning] \`, gated by WHERE, idempotent. Reader \`truncateToolOutputsInContent\` (\`distillation.ts\`): no \`\x1f\` → fast path; else split on \`\n\x1f\`, truncate per-envelope via \`toolStripAnnotation\`. Cap from \`config().distillation.toolOutputMaxChars\` (default 2000); \`cap=0\` disables; user-role never truncated. v12 adds nullable \`r\_compression\` + \`c\_norm\` REAL columns to \`distillations\`, persisted by \`storeDistillation\`. Both \`loadGen0\` and \`loadForSession\` SELECTs must include them or \`Distillation\` type errors at build. + +* **@loreai/gateway package: transparent LLM proxy for Claude Code/Cursor/etc**: @loreai/gateway package: transparent LLM proxy for Claude Code/Cursor/etc. Gateway HTTP proxy on port 6969 accepting \`/v1/messages\`. Session ID via \`\[lore:\]\` marker (8 random bytes + unix timestamp). If absent, forces \`stream:false\`, prepends marker, re-encodes SSE. Fallback: SHA-256 of first message. Project path from system prompt regex or \`X-Lore-Project\` header. Node22 bundle. Plugin auto-spawns gateway if not running: probes \`http://127.0.0.1:6969/health\`, spawns via \`Bun.spawn()\` if absent, waits up to 5s for readiness. Skip via \`LORE\_GATEWAY\_MODE=0\` or \`NODE\_ENV=test\`. - -* **Host abstraction: LoreMessage/LorePart types + LLMClient interface decouple core from OpenCode SDK**: \`@loreai/core\` is host-agnostic (zero dep on \`@opencode-ai/sdk\`). \`packages/core/src/types.ts\` defines \`LoreMessage\` (discriminated union on \`.role\`), \`LorePart\` (union of text/reasoning/tool/generic parts), \`LoreToolState\`, and \`LLMClient\` interface (single \`prompt(system, user, opts)\` method). \`LoreGenericPart\` has \`type: string\` catch-all which breaks discriminant narrowing — always use exported type guards \`isTextPart\`/\`isReasoningPart\`/\`isToolPart\` from \`./types\` for runtime checks (used throughout \`gradient.ts\` and \`temporal.ts\`). OpenCode adapter at \`packages/opencode/src/llm-adapter.ts\` implements \`LLMClient\`, wrapping \`client.session.create()\` + \`client.session.prompt()\`, owning agent-not-found retry and session rotation. Core modules (distillation/curator/search) accept \`llm: LLMClient\` param. Boundary casts use \`as unknown as\` since Lore and SDK part types are structurally compatible at runtime. + +* **Gateway OpenAI protocol translation layer**: Gateway accepts \`/v1/chat/completions\` and \`/v1/messages\`. \`parseOpenAIRequest()\` maps tool\_calls to tool\_use, preserves extras (temperature, top\_p). \`buildOpenAIResponse()\` handles streaming (SSE) and non-streaming. API key via \`x-api-key\` header → \`Authorization: Bearer\`. - -* **Knowledge entry distribution across projects — worktree sessions create separate project IDs**: Knowledge entries are scoped by project\_id from ensureProject(projectPath). OpenCode worktree sessions (paths like ~/.local/share/opencode/worktree/\/\/) each get their own project\_id. A single repo can have multiple project\_ids: one for the real path, separate ones per worktree session. Project-specific entries (cross\_project=0) are invisible across different project\_ids. Cross-project entries (cross\_project=1) are shared globally. - - -* **Lore search pipeline: FTS5 with AND-then-OR fallback and RRF fusion**: Lore search pipeline (\`src/search.ts\`): FTS5 with AND-then-OR fallback + RRF fusion. \`ftsQuery()\` builds AND queries (primary); \`ftsQueryOr()\` builds OR fallback when AND returns zero. Conservative stopword list keeps domain terms like 'handle', 'state', 'type'. \`bm25()\` column weights: title=6, content=2, category=3 (rank negative — more negative = better). Recall tool uses \`reciprocalRankFusion\(lists, k=60)\` across knowledge/temporal/distillation sources. \`forSession()\` uses OR-based BM25 since it ranks all candidates; safety net: top-5 project entries by confidence always included. \`distillation\_fts\` added in migration v7. - - -* **LTM injection pipeline: system transform → forSession → formatKnowledge → gradient deduction**: LTM injected via \`experimental.chat.system.transform\` hook. \`getLtmBudget()\` = (contextLimit - outputReserved - overhead) \* ltmFraction (default 0.05/5%, configurable 2-30%). \`forSession()\` loads project-specific entries unconditionally + cross-project entries scored by term overlap, greedy-packs into budget. \`formatKnowledge()\` renders as markdown. \`setLtmTokens()\` records consumption so gradient deducts it. LTM goes into \`output.system\` — invisible to \`tryFit()\`, counts against overhead budget. Curator updates invalidate \`ltmSessionCache\` which changes system prompt bytes → cache bust on next turn. Default lowered from 10% to 5% in cost-reduction PR — most projects don't fill 10%, smaller system prompt reduces re-write cost on any cache bust. - - -* **Monorepo structure: @loreai/core + opencode-lore packages with Bun workspaces**: Bun workspace monorepo, 3 packages: \`packages/core/\` (\`@loreai/core\`, runtime-agnostic, esbuild → \`dist/node/\` + \`dist/bun/\`), \`packages/opencode/\` (\`opencode-lore\`, ships raw TS — build is no-op echo so \`bun --filter '\*' build\` works uniformly), \`packages/pi/\` (\`@loreai/pi\`, ~132KB esbuild bundle). Root \`package.json\` is private with \`workspaces: \["packages/\*"]\` but MUST have \`main\`/\`exports\` pointing to \`./packages/opencode/src/index.ts\` — trampoline required because OpenCode's \`file:///\` plugin loader resolves from repo root. Declarations via \`tsc -p tsconfig.build.json\` into \`dist/types/\`. Tests: \`bun test\` from root with preload at \`packages/core/test/setup.ts\`. Workspace \`package.json\` \`main\`/\`types\` MUST point to built \`dist/\` (Node can't run \`.ts\`); workspace-internal consumers use tsconfig \`paths\` mapping \`@loreai/core\` → \`../core/src/index.ts\`. Build tsconfig MUST NOT include this mapping (TS6059). - - -* **OpenCode built-in compaction fully disabled — Lore owns all context management**: OpenCode built-in compaction fully disabled — Lore owns all context management. \`config\` hook sets \`cfg.compaction = { auto: false, prune: false }\`. Lore overrides manual \`/compact\` via \`experimental.session.compacting\`: chunked distillation via \`backgroundDistill\`, \`loadForSession\` (archived=0 post-F2) injected as \`output.context\`, \`findPreviousCompactSummary\` walks newest-first for assistant messages with truthy \`info.summary\`, \`buildCompactPrompt\` emits \`\\` anchor before SUMMARY\_TEMPLATE. Pi's \`session\_before\_compact\` is deterministic substitution, no anchor. 6 hooks Lore registers in \`packages/opencode/src/index.ts\`: \`config\`, \`event\` (message.updated/session.error/session.idle), \`experimental.chat.system.transform\` (LTM), \`experimental.chat.messages.transform\` (gradient), \`experimental.session.compacting\`, \`tool\` (recall). Any rename of these or upstream compaction not gated by \`compaction.auto\` breaks Lore silently. - - -* **SQLite #db/driver subpath import for Bun/Node dual-runtime**: Core uses Node subpath imports (\`#db/driver\` in package.json) to resolve \`bun:sqlite\` or \`node:sqlite\` at runtime. \`driver.bun.ts\` re-exports \`Database\` from \`bun:sqlite\` + \`sha256\` via \`node:crypto\`. \`driver.node.ts\` extends \`DatabaseSync\` from \`node:sqlite\` with \`.query(sql)\` shim using WeakMap statement caching — API parity with \`bun:sqlite\`. Tests run under Bun; esbuild bundles use \`conditions: \["node"|"bun"]\`. API differences: \`.query()\` vs \`.prepare()\`, \`{create:true}\` Bun-only, \`.transaction(fn)\` Bun-only (use manual BEGIN/COMMIT/ROLLBACK for cross-runtime). FTS5/pragmas identical. \`node:sqlite\` stable in Node 22.5+, no native addons. + +* **Plugin primary path: direct hooks with optional gateway fallback**: OpenCode plugin defaults to direct plugin mode (hooks in \`packages/opencode/src/index.ts:1537\`). At startup, probes gateway health (\`http://127.0.0.1:6969/health\`, 1.5s timeout) and attempts spawn via \`Bun.spawn()\` if absent. If gateway comes up, switches to observer-only hooks (lines 1450–1535) that log \`\[lore:verify]\` without mutating output; gateway does the real work. If probe fails or spawn doesn't complete in 5s, falls back to direct plugin path. Disable via \`LORE\_GATEWAY\_MODE=0\` or \`NODE\_ENV=test\`. \*\*Critical for production cost\*\*: worktree sessions use OpenCode build from worktree directory, not the patched BYK/cumulative fork. Worktree OpenCode may lack the tool-part cache mutation fix (patch \`88260b5e8\`), causing unbounded cache busts. Verify worktree is synced or runs with fix applied. ### Decision - -* **Lore standalone ACP server using Pi as agentic engine**: Lore is planned to become a standalone ACP server, independent of OpenCode: speak ACP to editors (Zed, JetBrains), use Pi (\`@mariozechner/pi-coding-agent\`) internally as the agentic loop engine, layer memory via Pi extensions. ACP proxy approach rejected — proxies cannot modify the downstream agent's message array or system prompt, losing gradient context management and LTM injection. As a full ACP agent, Lore owns the LLM interaction. Pi chosen for extension hooks (message injection, history filtering, custom compaction, custom tools) that map to Lore's existing OpenCode hooks. + +* **Batch API integration: gateway enhancement, not mandatory architecture shift**: Implementing Anthropic Message Batches API as a gateway-only feature (50% cost savings on distillation/curation workers) does not require mandating gateway for all deployments. Direct plugin path continues working normally; batching is an optional gateway optimization that transparently accumulates non-urgent distill/distill-curation calls, flushes every N seconds, polls results in background. Keeps gateway experimental status while capturing savings on high-volume workers (\`distillSegment\`, \`metaDistill\`, \`consolidate\`, worker validation). Estimate: ~$1,100/month savings on Lore workers alone. + + +* **Batch distillation consumption to reduce cache-bust frequency**: Batch distillation consumption at turn boundaries: Refresh \`loadDistillations()\` only at turn boundaries (new user message) or after idle gap > cache TTL (~5min). During autonomous tool chains (consecutive assistant→tool→assistant), freeze prefix—no DB hits. Context: prefix refresh costs \`context\_size × $3.75/MTok\` (~$1.88 per bust for 500K Sonnet). New distillations have marginal value mid-chain—model already has raw messages. Turn-boundary refresh reduces 189 arrivals → 8 refresh points in typical session, cutting bust cache writes from $639 → ~$15 (97% reduction). Combine with batching background distill workers: accumulate \`backgroundDistill()\` calls, flush at turn boundaries instead of firing on every \`message.updated\` event. ### Gotcha + +* **Anthropic cache TTL is 5 minutes, not 1 hour — inform refresh policy**: Anthropic prompt cache TTL is ~5 minutes ("Prompt Caching Write (5m)" in console), not 1 hour. Any prefix refresh within a 5-minute warm window pays full cache write cost (\`context\_size × $3.75/MTok\`). Refreshes are only "free" after idle gap > TTL (cache already cold). Cost-aware refresh policy must align with TTL: refresh at turn boundaries or idle-resume, never mid-chain. A 500K Sonnet bust costs ~$1.88; frequent refreshes in active sessions waste hundreds of dollars. + -* **Calibration used DB message count instead of transformed window count — caused layer 0 false passthrough**: Gradient/calibration traps: (1) Calibration must use transformed window count via \`getLastTransformedCount()\`, not DB count — delta≈1 → layer 0 passthrough → overflow. (2) \`actualInput\` must include \`cache.write\` — cold-cache otherwise falls to layer 0. (3) Trailing pure-text assistant messages cause Anthropic prefill errors; drop loop must run at ALL layers (layer 0 shares ref with output). Never drop messages with tool parts (\`hasToolParts\`) — infinite loop. (4) Unregistered projects get zero context management → stuck compaction loops; recovery deletes messages after last good assistant message. +* **Calibration used DB message count instead of transformed window count — caused layer 0 false passthrough**: Gradient calibration: (1) Use \`getLastTransformedCount()\`, not DB count—causes layer 0 overflow. (2) Include \`cache.write\` in \`actualInput\`. (3) Drop trailing pure-text assistant messages ALL layers; never drop tool parts. (4) Unregistered projects stuck in compaction; delete after last good assistant. - -* **Test DB isolation via LORE\_DB\_PATH and Bun test preload**: Test DB isolation + schema-aware test helpers: Lore test suite uses isolated temp DB via \`packages/core/test/setup.ts\` preload (\`bunfig.toml\`: \`preload = \["./packages/core/test/setup.ts"]\`). Preload sets \`LORE\_DB\_PATH\` to \`mkdtempSync\` path before any \`src/db.ts\` import; \`afterAll\` cleans up. \`agents-file.test.ts\` needs \`beforeEach\` cleanup; \`TEST\_UUIDS\` cleanup in \`afterAll\` shared with \`ltm.test.ts\`. OpenCode-specific tests live in \`packages/opencode/test/\`; driver-level in \`packages/core/test/db-driver.test.ts\`. TRAP: \`packages/opencode/test/index.test.ts:restoreDistillationTables()\` hand-rolls \`CREATE TABLE\` SQL after corruption-recovery tests drop the table — does NOT replay migrations, so any new column added via migration must also be added here or compacting suite fails with \`no such column\`. \`packages/core/test/db.test.ts\` hardcodes the schema version assertion — bump on every \`SCHEMA\_VERSION\` increment. + +* **Lore transform non-determinism breaks prompt cache between API calls**: Lore transform non-determinism breaks prompt cache: Each new distillation row changes prefix length → \`tryFitStable()\` recalculates raw window cutoff → entire output bytes change → cache bust. Root: \`distilledPrefixCached()\` calls \`addRelativeTimeToObservations(newRows, new Date())\` on each gen-0 row, changing relative time. Meta-distillations (17 events) trigger full re-render, collapsing rows (e.g., 10 gen-0 → 1 gen-1) and shrinking prefix. Fix: batch distillation consumption at turn boundaries \[\[019dfa53-b925-70e2-8f84-cab808d8e115]]—freeze prefix during autonomous chains. Secondary: \`sanitizeToolParts()\` at layer 1+ uses \`splice()\`, replacing message array. - -* **toolStripAnnotation path regex catastrophically backtracks; needs slash-fast-exit + scan cap**: \`gradient.ts:329\` \`toolStripAnnotation\` path-extraction regex \`/(?:\[\w.-]+\\/)+\[\w.-]+\\.\w{1,5}/g\` catastrophically backtracks (100KB single-letter ~27s). Two cheap defenses: (1) skip regex when \`output.indexOf("/") === -1\`; (2) cap scan to first 64KB via \`output.slice(0, 64\_000)\`. Pinned with perf regression tests in \`distillation.test.ts\`. F3 expanded blast radius — runs on every distill segment now. + +* **System prompt size bloat from AGENTS.md injection: 41K tokens on single session**: System prompt reaches ~41K tokens: AGENTS.md (project-specific knowledge ~16K) + Lore entries (~7K) + OpenCode base system prompt (~5K) + tool definitions (~8-10K). The 41K is cached read-only (Anthropic caches it), but AGENTS.md's 68KB file (growing with lore exports) is expensive to maintain. Lore entries change frequently (new entries added post-session). Consider: truncate lore entries in AGENTS.md to recent 10-15 instead of all 26, or split knowledge into separate injection path (not system prompt) to avoid system-prompt byte changes on each knowledge update. + + +* **Worktree OpenCode instances lack upstream cache bust fix — enable tool-part caching**: OpenCode worktree instances (created via \`mkdir worktree\`, managed independently from main install) may not have patch \`88260b5e8\` (cache \`msgs\` array across prompt loop iterations). Without it, tool-part state mutations (\`pending\` → \`completed\` + output) between API calls break prompt cache on nearly every call. Real cost impact: session with 667 API calls = 78 busts costing $77 (12% of calls, 99% of cache-write cost). Each bust rewrite grows to 470K tokens. Fix: sync worktree OpenCode to BYK/cumulative branch OR apply patch directly. Workaround: set aggressive layer-0 cap ($0.05/turn for Sonnet) to escalate to layer 1+ where mutations are contained. ### Pattern - -* **F2 metaDistill anchoring + loadForSession archive default**: F2 metaDistill anchoring: \`metaDistill\` (\`packages/core/src/distillation.ts\`) anchors on prior gen>0 meta via \`latestMetaObservations(projectPath, sessionID)\` and only consolidates NEW gen-0 since. \`recursiveUser()\` emits \`\\` block when anchored; byte-identical when absent. Generation chain: \`Math.max(existing.gen, priorMeta.gen) + 1\` — \`loadGen0\` only returns gen=0, must fold in \`priorMeta.generation\`. Threshold: anchored ≥1 new gen-0; first-round ≥3. \`loadForSession\` defaults to \`archived=0\`. storeDistillation→archive isn't transactional; mid-crash leaves stale meta + un-archived gen-0s, next run re-consolidates. + +* **Distillation row arrivals trigger cache busts via prefix budget shifts**: Each new gen-0 distillation row (~189 total across session) changes the distilled prefix text length → shrinks raw window budget → \`tryFitStable()\` recalculates raw window cutoff → messages evicted/included from front → entire output array bytes change. Even with \`tryFitStable()\` pinning logic, prefix token growth forces re-evaluation. Result: alternating bust/warm pattern (bust when row arrives, warm on subsequent call with same row count). Meta-distillations compound this: 17 full re-renders with \`new Date()\` cause relational time annotations to potentially differ, plus row count collapse (e.g., 10 gen-0 → 1 gen-1 row) shrinks prefix drastically. - -* **F8 isContextOverflow: mirror upstream OVERFLOW\_PATTERNS verbatim, check data.statusCode for APIError-shape**: F8 isContextOverflow detection (\`packages/opencode/src/index.ts\`): mirrors upstream's \`OVERFLOW\_PATTERNS\` regex list verbatim from \`provider/error.ts\` (19 provider regexes + Cerebras/Mistral 'no body' fallback), all \`/…/i\`. Plugins receive \`session.error\` as \`.toObject()\` wire form: \`{ name, data: { message, responseBody? } }\` — NOT an Error instance, no \`cause\`, no top-level \`statusCode\` (kept on APIError variants as \`data.statusCode\`). Detection order: (1) \`name === 'ContextOverflowError'\`, (2) \`data.statusCode === 413\`, (3) regex over \`data.message ?? message\`. Old \`.includes()\` was case-sensitive — must use regex. + +* **distillSegment urgency tiers: defer-safe vs blocking paths**: Not all \`distillSegment()\` calls tolerate batching latency (up to 1h). Fire-and-forget calls in \`message.updated\` (line 836) and \`messages.transform\` layer≥2 (line 1341) defer result to next turn—batch-safe. But overflow recovery (line 888) and \`/compact\` (line 1368) \`await\` result immediately to build recovery/compact prompt—batch-unsafe. Thread \`urgent?: boolean\` flag through \`backgroundDistill()\` to \`distillation.run()\`: urgent=true bypasses batch queue, uses synchronous \`prompt()\`. Batch viable for ~80% of distillation volume (idle/incremental paths). - -* **Idle-resume cache refresh: clear caches when wall-clock gap exceeds prompt cache TTL**: Idle-resume cache refresh: when wall-clock gap between turns exceeds provider's prompt cache TTL (Anthropic: 5min default, 1hr extended), cache is cold. \`@loreai/core\` exports \`onIdleResume(sessionID)\` and \`consumeCameOutOfIdle(sessionID)\`. \`transform()\` records \`lastTurnAt\` per-session. Hook adapters call \`onIdleResume()\` at top of pre-LLM hook (OpenCode: \`experimental.chat.system.transform\`; Pi: \`before\_agent\_start\`) — if \`now - lastTurnAt > idleResumeMinutes\*60\_000\` (config default 60), clears \`prefixCache\` + \`rawWindowCache\` and sets \`cameOutOfIdle=true\`. Adapters also \`ltmSessionCache.delete(sessionID)\`. Flag consumed in LTM-degraded recovery branch. + +* **Gateway model-based upstream routing with fallback to env vars**: Gateway auto-infers upstream provider URL from model prefix (claude-\* → Anthropic, gpt-\* → OpenAI). \`resolveUpstreamRoute()\` maintains routing table matching models to URLs/protocols. Unknown models fall back to \`LORE\_UPSTREAM\_ANTHROPIC\`/\`LORE\_UPSTREAM\_OPENAI\` env vars, enabling zero-config forwarding. - -* **Layer 4 token-budget tail (F7)**: Gradient layer 4 token-budget tail (F7): \`gradient.ts\` layer 4 ('nuclear') replaced fixed \`slice(-3)\` with \`tailBudget = clamp(usable \* 0.25, 2\_000, 8\_000)\`. Walks backward from \`currentTurnStart()\` accumulating \`estimateMessage()\` tokens until exhausted. Current turn (last user + subsequent assistants) ALWAYS included even if it exceeds budget — terminal layer must always return. Tool parts NOT stripped (would cause infinite tool-call loop). Distilled prefix unchanged. Variable scoping: \`transformInner()\` declares \`const turnStart\` inside layer 3; layer 4 must use a different name (e.g. \`nuclearTurnStart\`). + +* **Gateway package: new fourth runtime adapter for proxy-based context management**: Gateway package: runtime-agnostic HTTP proxy accepting Anthropic \`/v1/messages\`, applying full Lore pipeline (gradient, LTM, distillation), forwarding upstream. Implements \`LLMClient\` in \`llm-adapter.ts\`. Supports optional interceptor for recording/replay. Plugin spawns gateway if not running (probes \`http://127.0.0.1:6969/health\`, waits 5s), then registers observer hooks in gateway mode to audit gateway decisions without mutating output — logs session ID verification, LTM entries selected, gradient layer/tokens chosen. Observer reads \`temporal\_messages\`, \`knowledge\` tables; runs local \`transform()\` and \`forSession()\` for comparison. - -* **Lore logging: LORE\_DEBUG gating for info/warn, always-on for errors**: \`packages/core/src/log.ts\`: \`log.info()\`/\`log.warn()\` suppressed unless \`LORE\_DEBUG=1|true\`; \`log.error()\` always emits to stderr with \`\[lore]\` prefix. Exists because OpenCode TUI renders all stderr as red error text. Use \`log.info()\` for status, \`log.warn()\` for non-actionable oddities, \`log.error()\` only in catch blocks. Never use \`console.error\` directly. LORE\_DEBUG also gates per-turn gradient diagnostics (layer, tokens, cap, prefix hash, system prompt hash) for cache-bust investigation. TRAP: any diagnostic added via \`log.info()\` is invisible by default — for metrics that must be observable without env-var gating, write to DB columns rather than logs. + +* **Gradient layer transitions trigger cascade of cache busts in Lore**: Late-stage sessions show phase transition at ~step 668: bust rate jumps from 12% → 51%. Correlates with context window growth crossing layer-0 cap, escalating to layer-1+ (higher cost, different message restructuring). Each layer transition may alter how gradient injects context, changing message array bytes and invalidating prompt cache. Effect compounds: higher layer cost + more busts = quadratic explosion. Monitor gradient layer choice at step transitions; may need per-layer cache validation or deterministic layer boundary crossing. - -* **Lore release process: craft + issue-label publish**: Release process (craft + issue-label publish): publishes 4 tarballs (\`@loreai/core\`, \`@loreai/opencode\`, \`@loreai/pi\`, \`opencode-lore\` legacy mirror) via \`bun pm pack\` + jq name-swap + repack. \`npm version --workspaces\` fails EUNSUPPORTEDPROTOCOL on \`workspace:\*\` — \`preReleaseCommand: scripts/bump-version.sh\` rewrites \`version\` via jq AND patches \`bun.lock\` workspace version fields via awk. CRITICAL: \`bun pm pack\` rewrites \`workspace:\*\` from \`bun.lock\`, NOT package.json — without lockfile patch, tarballs ship stale deps (ETARGET). \`actions/setup-node@v4\` MUST set \`registry-url: https://registry.npmjs.org\` or OIDC fails ENEEDAUTH. Stale release recovery: if \`release.yml\` ran before a needed PR merged, \`gh issue close \\`, \`git push origin --delete release/X.Y.Z\`, re-run workflow. + +* **Idle-resume cache refresh: clear caches when wall-clock gap exceeds prompt cache TTL**: Clear caches when wall-clock gap exceeds prompt cache TTL. If \`now - lastTurnAt > 60min\`, call \`onIdleResume(sessionID)\` in pre-LLM hook to clear \`prefixCache\`, \`rawWindowCache\`, delete \`ltmSessionCache\`, set \`cameOutOfIdle=true\`. - -* **Pi smoke test pattern for @loreai/pi extension**: Smoke-test @loreai/pi against a real Pi install without touching prod DB. Two flows: (1) Local dist via \`pi -e /path/to/packages/pi/dist/index.js -p '...'\` bypasses discovery. (2) \`pi install npm:@loreai/pi@latest\` then \`pi -p '...'\` exercises manifest discovery, peerDeps, jiti alias resolution. Setup: \`npm install -g @mariozechner/pi-coding-agent\`, export \`LORE\_DB\_PATH=/tmp/lore-pi-test.db\`, export \`ANTHROPIC\_API\_KEY\` (from \`~/.local/share/opencode/auth.json:anthropic.key\`), run in scratch cwd. Verify via sqlite3: \`temporal\_messages\` has user+assistant rows (thinking as \`\[reasoning]\` prefix), session IDs use \`pi-\\`. Test recall with pre-inserted beacon + \`--tools bash,recall\`; test LTM injection without granting \`recall\`. + +* **Long-running autonomous sessions hit quadratic cache cost — session length budget needed**: Long-running sessions hit quadratic cache cost via non-deterministic transform. Session with 1,345 API calls: 314 calls (23%) read only 40,913 tokens (system prompt), rewriting 400–690K tokens each (busts). Two root causes: (1) Distillation row arrivals (~189 total) change \`distilledPrefix()\` length → shrink raw window budget → entire message array bytes change. (2) \`sanitizeToolParts()\` line 833 uses \`Date.now()\` to convert pending tool parts to error, producing different timestamps on every \`transform()\` call even with same input. OpenCode's cache fix (e148f00aa) preserves old pending parts in cached array—but Lore re-timestamps them. Fix distillation consumption at turn boundaries \[\[019dfa53-b925-70e2-8f84-cab808d8e115]] and use deterministic timestamp (0 or message.time.created) instead of \`Date.now()\` in sanitizeToolParts. - -* **PR workflow for opencode-lore: branch → PR → auto-merge**: All changes (including minor fixes and test-only changes) must go through a branch + PR + auto-merge, never pushed directly to main. Workflow: (1) git checkout -b \/\, (2) commit, (3) git push -u origin HEAD, (4) gh pr create --title "..." --body "..." --base main, (5) gh pr merge --auto --squash \. Branch name conventions follow merged PR history: fix/\, feat/\, chore/\. Auto-merge with squash is required (merge commits disallowed). Never push directly to main even for trivial changes. + +* **Observer hooks in gateway mode: local gradient/LTM audit without mutation**: When gateway is active, plugin registers observer-only hooks that run \`transform()\` and \`ltm.forSession()\` locally on raw messages, then log results as \`\[lore:verify]\` without mutating \`output\`. Session marker scanned from incoming messages for ID verification. Observer reads \`temporal\_messages\` and \`knowledge\` tables to audit gateway's gradient layer, token estimates, and LTM entry selection—enabling side-by-side comparison in production without altering the real data flow (gateway remains the authoritative LLM handler). - -* **Recall logic extracted to core, thin tool wrappers per host**: Recall search+format logic lives in \`packages/core/src/recall.ts\` as host-agnostic \`runRecall({projectPath, sessionID, query, scope, llm, knowledgeEnabled, searchConfig})\` returning a formatted markdown string. Host adapters wrap it in their tool framework: \`packages/opencode/src/reflect.ts\` uses \`tool({args, execute})\` with OpenCode's Zod-ish schema; \`packages/pi/src/reflect.ts\` uses \`pi.registerTool({parameters: Type.Object(...), execute})\` with Typebox. Both adapters are ~75 lines — all BM25/FTS5/RRF fusion, vector search, cross-project discovery, lat.md section search stays in core. Pattern applies to any future host (ACP, CLI): keep logic in \`@loreai/core\`, write a thin tool-framework wrapper per host. + +* **Plugin auto-detects gateway and configures provider baseURLs via config hook**: Plugin auto-detects gateway via health check to \`http://localhost:6969/health\` (1s timeout). If responding, plugin sets \`baseURL: http://localhost:6969\` for all providers (Anthropic, OpenAI, Google, Nvidia, etc), routing requests through gateway. Sets \`LORE\_GATEWAY\_MODE=1\` to disable normal operation. - -* **SQLite migration runner: per-statement atomic, not transactional**: SQLite migration runner (\`packages/core/src/db.ts\`) is per-statement atomic, NOT transactional: \`migrate()\` calls \`database.exec(MIGRATIONS\[i])\` with no BEGIN/COMMIT — write idempotent SQL. Traps: (1) \`ALTER TABLE ADD COLUMN\` is NOT idempotent — duplicate-column error aborts later statements (migration 7 killed \`kv\_meta\` on re-run). Fix: catch duplicate-column errors and re-exec remaining statements; \`recoverMissingObjects()\` runs every startup to idempotently \`CREATE TABLE IF NOT EXISTS\` critical tables (structural only, no backfill — add new critical tables here). (2) \`db()\` must NOT assign \`instance\` until \`migrate()\` succeeds. VACUUM special-cased at \`VACUUM\_MIGRATION\_INDEX = 2\`. FTS5 content-table-backed: per-row UPDATE auto-reindexes; bulk via \`INSERT INTO temporal\_fts(temporal\_fts) VALUES('rebuild')\`. + +* **Plugin resolver shim for workspace source resolution**: Installed plugin at \`~/.config/opencode/plugins/lore.ts\` is a re-export shim (\`export { LorePlugin as default } from "opencode-lore"\`). This allows the plugin to always resolve to workspace source (\`packages/opencode/src/index.ts\`) without requiring rebuild after code changes. Plugin loads the active source directly on each OpenCode startup. -* **Time-gap-aware detectSegments + recall recency RRF**: \`detectSegments\` (\`packages/core/src/distillation.ts\`) prefers splitting at the largest inter-message time gap when that gap is ≥3× the median gap; falls back to count-based splitting for uniform timestamps (preserves legacy behavior). Min segment size 3 — tiny trailing segments still merged into previous. Exported for tests. Recall pipeline (\`packages/core/src/recall.ts:runRecall\`) adds a recency-sorted list of temporal results to RRF fusion alongside the BM25 list, both keyed \`t:\\` so RRF naturally boosts items appearing in both — no new thresholds. Pure additive; no schema changes; existing BM25-only behavior preserved when recency list is empty. - -### Preference - - -* **Code style**: No backwards-compat shims — fix callers directly. Prefer explicit error handling over silent failures. Derive thresholds from existing constants rather than hardcoding magic numbers. In CI, define shared env vars at workflow level, not per-job. Dry-run before bulk destructive operations (SELECT before DELETE). Prefer \`jq\`/\`sed\`/\`awk\` over \`node -e\` for JSON manipulation in CI scripts. - - -* **Prompt change discipline: separate user-facing vs worker prompts, gate intelligence-tradeoffs**: User-facing system prompt (injected via \`experimental.chat.system.transform\` in OpenCode, \`before\_agent\_start\` in Pi) is reserved for: one-time greeting, LTM knowledge block, AGENTS.md commit reminder. Do NOT add length caps, verbosity instructions, or behavioral guidance — Anthropic's April 2026 postmortem showed a \`≤25 words between tool calls / ≤100 words final\` instruction caused a 3% intelligence drop. Worker prompts (\`DISTILLATION\_SYSTEM\`, \`CURATOR\_SYSTEM\`, etc. in \`packages/core/src/prompt.ts\`) are isolated and may have strict length limits (curator: ≤150 words). Any change that could trade off intelligence requires: ablation testing, per-model evals, soak period, gradual rollout. See \`docs/PROMPT\_CHANGES.md\` for the review bar. +* **Time-gap-aware detectSegments + recall recency RRF**: \`detectSegments()\` splits at largest inter-message time gap (≥3× median); falls back to count-based for uniform timestamps. Min segment 3. Recall adds recency-sorted temporal results to RRF fusion keyed \`t:\\`, naturally boosting items in both BM25 + recency lists. Pure additive; no schema changes. diff --git a/packages/core/src/gradient.ts b/packages/core/src/gradient.ts index 7808dcd..f6fd468 100644 --- a/packages/core/src/gradient.ts +++ b/packages/core/src/gradient.ts @@ -829,8 +829,14 @@ function sanitizeToolParts( if (status === "completed" || status === "error") return part; // pending or running → convert to error so SDK emits tool_result + // Use a deterministic timestamp (0) instead of Date.now() so that + // repeated transform() calls on the same stale pending part produce + // identical bytes. OpenCode's prompt-loop cache fix (e148f00aa) + // preserves old pending parts across iterations; Date.now() here + // would re-stamp them each call → different bytes → cache bust. partsChanged = true; - const now = Date.now(); + const existingStart = + "time" in part.state ? part.state.time.start : 0; return { ...part, state: { @@ -840,8 +846,8 @@ function sanitizeToolParts( metadata: "metadata" in part.state ? part.state.metadata : undefined, time: { - start: "time" in part.state ? part.state.time.start : now, - end: now, + start: existingStart, + end: existingStart, }, }, } as LorePart; diff --git a/packages/core/test/gradient.test.ts b/packages/core/test/gradient.test.ts index b2b2995..d2073bc 100644 --- a/packages/core/test/gradient.test.ts +++ b/packages/core/test/gradient.test.ts @@ -2181,3 +2181,91 @@ describe("gradient — distillation snapshot caching", () => { expect(allText).toContain("Post-idle observation"); }); }); + +describe("gradient — sanitizeToolParts determinism", () => { + // sanitizeToolParts converts pending/running tool parts to error state. + // It must use deterministic timestamps so repeated transform() calls on the + // same stale pending part produce identical bytes (prompt cache stability). + const SID = "sanitize-determ-sess"; + + beforeAll(() => { + setModelLimits({ context: 10_000, output: 2_000 }); + calibrate(0); + }); + + afterAll(() => { + setModelLimits({ context: 10_000, output: 2_000 }); + calibrate(0); + }); + + function makeAssistantWithPendingTool( + id: string, + toolName: string, + ): LoreMessageWithParts { + return { + info: { + id, + sessionID: SID, + role: "assistant" as const, + time: { created: 1000 }, + parentID: `parent-${id}`, + modelID: "claude-sonnet-4-20250514", + providerID: "anthropic", + mode: "build", + path: { cwd: "/test", root: "/test" }, + cost: 0, + tokens: { input: 100, output: 50, reasoning: 0, cache: { read: 0, write: 0 } }, + }, + parts: [ + { + id: `part-text-${id}`, + sessionID: SID, + messageID: id, + type: "text" as const, + text: "Let me run this tool.", + time: { start: 1000, end: 1000 }, + }, + { + id: `part-tool-${id}`, + sessionID: SID, + messageID: id, + type: "tool" as const, + tool: toolName, + callID: `call-${id}`, + state: { + status: "pending" as const, + input: { command: "ls -la" }, + }, + }, + ], + }; + } + + test("consecutive transforms produce identical bytes for stale pending tool parts", () => { + const messages: LoreMessageWithParts[] = [ + makeMsg("san-1", "user", "Hello", SID), + makeAssistantWithPendingTool("san-2", "bash"), + ]; + + const result1 = transform({ messages, projectPath: PROJECT, sessionID: SID }); + // The pending tool part should have been converted to error + const toolPart1 = result1.messages + .flatMap((m) => m.parts) + .find((p) => isToolPart(p)); + expect(toolPart1).toBeDefined(); + expect(toolPart1!.state.status).toBe("error"); + + // Second call with the exact same messages (simulating OpenCode's cached array) + const result2 = transform({ messages, projectPath: PROJECT, sessionID: SID }); + const toolPart2 = result2.messages + .flatMap((m) => m.parts) + .find((p) => isToolPart(p)); + expect(toolPart2).toBeDefined(); + expect(toolPart2!.state.status).toBe("error"); + + // The serialized bytes must be identical — this is what Anthropic's cache sees + const json1 = JSON.stringify(toolPart1!.state); + const json2 = JSON.stringify(toolPart2!.state); + expect(json2).toBe(json1); + }); +});