Skip to content

perf(db): split embeddings + in-memory similarity cache + drop redundant co_changes index#2

Closed
andreinknv wants to merge 1 commit into
pr-111-rebasedfrom
perf/db-optimizations-llm
Closed

perf(db): split embeddings + in-memory similarity cache + drop redundant co_changes index#2
andreinknv wants to merge 1 commit into
pr-111-rebasedfrom
perf/db-optimizations-llm

Conversation

@andreinknv
Copy link
Copy Markdown
Owner

Summary

Three measurable wins on the LLM-tier data layer, validated by spike before implementing each:

Change Win Spike file
F2 Drop idx_co_changes_a Covered by (file_a, file_b) PK (analogous to PR #1)
G Split embeddings into symbol_embeddings table 3.22× faster summary-only scans scripts/spikes/spike-embedding-split.mjs
H In-memory EmbeddingCache for similarity search 4.4× faster top-K cosine scripts/spikes/spike-embedding-split.mjs

Note: This PR sits on top of colbymchenry#111 (LLM symbol summaries). The spike script reproducer covers G and H; F2 follows the same left-prefix-scan-covers-narrow-index pattern as PR #1 (perf/drop-redundant-edge-indexes).

Empirical validation

Run yourself: node scripts/spikes/spike-embedding-split.mjs. Output on a 50K-summary / 768d-embedding synthetic DB:

--- Spike G: storage layout (inline vs split) ---

  inline DB: 196.5 MB
  split  DB: 204.0 MB

  Test: scan summaries by role (common path)
  inline: 46ms avg over 50 queries
  split : 14ms avg over 50 queries
  Δ summary-only: split is 3.22× faster

  Test: scan summaries WITH embedding (rare path)
  inline (single table)   : 71ms avg over 50 queries
  split  (join required)  : 80ms avg over 50 queries
  Δ summary+embedding: 1.12× cost penalty for split

--- Spike H: in-memory embedding cache ---

  cold (per-query SQLite fetch + decode): 104ms avg over 20 queries
  warm (in-memory Float32Array matrix)  : 24ms avg over 20 queries
  Δ similarity search: 4.4× speedup with in-memory cache

The 1.12× cost on summary+embedding scans is dwarfed by the 3.22× win on summary-only scans, which dominate by ~50× in real usage (FTS-anchor lookups, role filters, freshness checks all read summaries-without-embeddings).

What changes

F2 — Migration 015: drop idx_co_changes_a

  • co_changes has PRIMARY KEY (file_a, file_b), which automatically creates a B-tree leading on file_a. SQLite covers WHERE file_a = ? lookups via that PK index — the standalone idx_co_changes_a was redundant.
  • idx_co_changes_b (on file_b alone) is kept because the PK leads with file_a, so it cannot serve WHERE file_b = ? lookups.
  • Fresh-DB schema (src/db/schema.sql) updated to skip idx_co_changes_a and dedupe a pre-existing duplicate of idx_co_changes_b.

G — Migration 016: split embeddings into symbol_embeddings

CREATE TABLE symbol_embeddings (
    node_id TEXT PRIMARY KEY,
    embedding BLOB NOT NULL,
    embedding_model TEXT NOT NULL,
    FOREIGN KEY (node_id) REFERENCES symbol_summaries(node_id) ON DELETE CASCADE
);
INSERT OR IGNORE INTO symbol_embeddings (node_id, embedding, embedding_model)
  SELECT node_id, embedding, embedding_model
  FROM symbol_summaries
  WHERE embedding IS NOT NULL AND embedding_model IS NOT NULL;
DROP INDEX IF EXISTS idx_summaries_embedding_model;
ALTER TABLE symbol_summaries DROP COLUMN embedding;
ALTER TABLE symbol_summaries DROP COLUMN embedding_model;
  • Requires SQLite 3.35+ for ALTER TABLE DROP COLUMN. Both better-sqlite3 and node-sqlite3-wasm ship with newer versions, so this is safe.
  • queries.ts methods (getEmbeddableSummaries, getAllEmbeddings, upsertSymbolEmbedding) updated to use the new table.
  • clear() and clearCoChanges() extended to wipe both tables (the FK cascade would handle it, but explicit is safer if foreign-key enforcement gets disabled).

H — EmbeddingCache in src/llm/embeddings.ts

export class EmbeddingCache {
  get(fetcher: EmbeddingFetcher, model: string): CachedEmbeddings;
  invalidate(): void;
}
  • Decodes every embedding into a flat Float32Array matrix once per (model, generation).
  • New topKByCosineMatrix(query, matrix, ids, dim, k) operates directly on the flat layout.
  • Owned by the CodeGraph instance.
  • Invalidated on:
    • indexAll end (when filesIndexed > 0)
    • sync end (when files added/modified/removed)
    • After embedAllSummaries (when generated > 0)
    • clear() (table was emptied)
  • Mismatched-dim rows are skipped on rebuild — produces a packed matrix with no sparse holes.

Independent review

Reviewed by an independent reviewer agent (read-only, fresh context). Surfaced 3 issues, all addressed in the same diff:

  1. CodeGraph.clear() was missing a cache invalidation — fixed: added this.embeddingCache.invalidate().
  2. Duplicate idx_co_changes_b in schema.sql (pre-existing, harmless due to IF NOT EXISTS) — fixed: deduped.
  3. EmbeddingCache left sparse undefined holes in ids on dim-mismatch — fixed: rewrote to push aligned entries instead of pre-allocating, with a regression test.

Test plan

  • npx tsc --noEmit clean
  • npx vitest run794/794 tests pass
  • New tests: __tests__/migrations-015-016.test.ts (upgrade + fresh-DB paths for both migrations)
  • New tests in __tests__/embeddings.test.ts:
    • topKByCosineMatrix matches topKByCosine on the same data
    • EmbeddingCache hit/miss/invalidate
    • Cache returns empty result without re-fetching
    • Cache skips dim-mismatched rows
  • Spike script (scripts/spikes/spike-embedding-split.mjs) reproduces the headline numbers

Files changed

File Change
src/db/migrations/015-prune-co-changes-index.ts New: drop idx_co_changes_a
src/db/migrations/016-split-symbol-embeddings.ts New: split embeddings into dedicated table
src/db/migrations/index.ts Register both new migrations
src/db/schema.sql Fresh-DB schema reflects new layout; dedupe idx_co_changes_b
src/db/queries.ts getAllEmbeddings, upsertSymbolEmbedding, getEmbeddableSummaries use symbol_embeddings
src/llm/embeddings.ts EmbeddingCache + topKByCosineMatrix
src/index.ts Cache integrated into searchHybrid and findSimilar; invalidations wired
__tests__/migrations-015-016.test.ts New: focused migration tests
__tests__/embeddings.test.ts New: cache + matrix tests
__tests__/foundation.test.ts Updated schema-version expectation
__tests__/pr19-improvements.test.ts Updated schema-version expectation
scripts/spikes/spike-embedding-split.mjs New: G + H reproducer

🤖 Generated with Claude Code

…emory similarity cache

Three independently measurable wins on the LLM-tier data layer,
all validated up-front via spike before implementation
(scripts/spikes/spike-embedding-split.mjs).

F2 - drop idx_co_changes_a (migration 015)
  The (file_a, file_b) PRIMARY KEY index already covers
  WHERE file_a = ? via SQLite left-prefix scan, so the narrow
  idx_co_changes_a was dead weight. idx_co_changes_b (on file_b
  alone) is kept because the PK leads with file_a.

G - split embeddings into a dedicated table (migration 016)
  Moves the 768-dim Float32 BLOB out of symbol_summaries into a
  new symbol_embeddings table with FK + ON DELETE CASCADE.
  Spike measurement on a 50K-summary synthetic DB:
    - Summary-only scan (common path): 3.22x faster (46ms -> 14ms)
    - Summary+embedding scan (rare path): 1.12x cost penalty
    - DB size: ~4% larger (separate page chain)
  Net positive: the common path dominates real usage.

H - in-memory EmbeddingCache for similarity search
  EmbeddingCache decodes every embedding into a flat Float32Array
  matrix once and reuses it across queries. topKByCosineMatrix
  operates directly on the flat layout. Cache is invalidated on
  indexAll, sync, embedAllSummaries (when generated > 0), and
  clear() - anywhere new vectors land or the table is emptied.
    - Cold (per-query SQLite fetch + decode): 104ms avg
    - Warm (in-memory matrix): 24ms avg
    - 4.4x speedup with cache

Also fixes a pre-existing schema.sql inconsistency where
idx_co_changes_b was declared twice (harmless thanks to
IF NOT EXISTS, but confusing).

Test coverage:
  - __tests__/migrations-015-016.test.ts: upgrade-path and
    fresh-DB behavior for both new migrations.
  - __tests__/embeddings.test.ts: topKByCosineMatrix matches
    topKByCosine; EmbeddingCache hit/miss/invalidate/dim-mismatch.

794/794 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@andreinknv
Copy link
Copy Markdown
Owner Author

Superseded by upstream PR colbymchenry#123

@andreinknv andreinknv closed this Apr 28, 2026
andreinknv added a commit that referenced this pull request Apr 30, 2026
Two fixes surfaced while verifying b6cef74 against the live ollama and
codegraph indexes:

1. buildReviewContext spread clobbered DEFAULTS with undefined.
   { ...DEFAULTS, ...options } with options.maxCoChangeWarnings set to
   undefined (the shape MCP forwards when the caller did not override)
   set opts.maxCoChangeWarnings to undefined. undefined > 0 is false,
   so the co-change loop was skipped entirely — coChangeWarnings: [] on
   every default review_context call. Same shape silently disabled the
   jaccard threshold and uncapped callers/callees. Replaced the spread
   with a manual merge that ignores undefined values.

2. searchHybrid only diversified the strict no-embeddings-backend path.
   When embeddings are configured but the cache is empty (the common
   summarize:false state, which both project configs use), the call
   fell through to ftsResults.slice(0, limit) without diversification —
   letting six New constructors flood a 12-result budget. Same issue on
   the embed-call-failed and empty-vec-result paths. Routed all three
   FTS-only fallbacks through diversifyByName.

Regression tests in stress-test-roundtwo-fixes.test.ts exercise both
paths through the real surface (MCP-shape options for #1, configured-
but-unpopulated embeddings for #2). Suite: 1074 / 13 skip / 0 fail.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv added a commit that referenced this pull request Apr 30, 2026
End-to-end measurement of the cache-preservation work uncovered a
real cost/quality tradeoff: bulk summary work runs 2.6x faster on a
local MLX-served model (Qwen3-Coder-30B-A3B-Instruct-4bit-DWQ via
mlx-openai-server) than on claude-bridge -> Haiku, with comparable
quality. But the same Qwen3-Coder model confidently confabulates
WRONG answers on synthesis-heavy ask queries — got the PRAGMA
foreign_keys direction reversed and cited the wrong migration
filename when asked about the cache-preservation mechanism. Sonnet
via claude-bridge gave precisely correct answers, but takes 12-40s.

The pre-existing schema (chat.askModel: string) only allowed swapping
the model id within the SAME provider/endpoint. To route bulk and
ask through entirely DIFFERENT providers, this PR adds a top-level
`askChat` block.

## Changes

### Schema (src/types.ts, src/llm/client.ts)

New optional `llm.askChat` block mirroring `chat`:

    "chat":    { "provider": "openai-compat",
                 "endpoint": "http://localhost:8081/v1",
                 "model": "qwen3-coder" },
    "askChat": { "provider": "claude-bridge",
                 "model": "claude-sonnet-4-6" }

`LlmEndpointConfig.askChat?: ChatProviderConfig | null` parallels the
existing `chat` slot. `normalizeEndpointConfig` returns the new field
alongside chat + embeddings.

### Resolver (src/llm/provider.ts)

New `resolveAskChat()` mirrors `resolveChat()` but never reads legacy
flat fields — askChat is opt-in only. Wired into `resolveLlmProviders`
so the resulting `ResolvedLlm` carries both `chat` and `askChat` slots.

New `getAskModel()` helper with cascade:
- askChat.model -> chat.askModel -> chat.model -> chatModel (legacy)
This is the right primitive for any caller that wants to know which
model id will actually answer an ask/dead-code call.

### Client routing (src/llm/client.ts)

`LlmClient` now lazy-loads TWO backends instead of one:
- `chatBackend` — for bulk work (summaries, classifier, dir summaries).
- `askChatBackend` — for ask + dead-code judge, only when askChat is
  configured.

`LlmClient.chat(messages, { useAskModel: true })` checks `askChatCfg`:
- If set → dispatch to ask backend with `useAskModel: false` (the ask
  backend's own `model` field is already the right model; no re-swap
  needed within that backend).
- If unset → dispatch to chat backend with `useAskModel` passed
  through (legacy single-provider behaviour: same backend, swap model
  id via `chat.askModel`).

`instantiateBackend` factored out from `getChatBackend` so both paths
share construction logic.

`isReachable()` now also probes the askChat backend when configured —
returns false if EITHER chat OR askChat fails, so status output never
claims reachable when ask calls would throw at runtime (e.g.
claude-bridge binary missing on the ask side).

### Display + gating (src/bin/codegraph.ts, src/mcp/tools.ts)

- `codegraph status` shows a separate `Ask model:` line when ask
  routes differently from chat. Provider hint in parentheses
  (`Ask model: claude-sonnet-4-6 (claude-bridge)`) appears only in
  true split-provider setups, not for single-provider askModel
  overrides.
- `codegraph ask` CLI trailer now uses getAskModel for the displayed
  model id so the printed `model X` matches what actually generated
  the answer.
- handleAsk and handleDeadCode in the MCP tools now gate on
  getAskModel rather than getChatModel, with error messages updated
  to mention the askChat block as an alternative configuration path.
  Pre-fix, a config with chat=null but askChat configured (rare but
  legitimate) would have failed the gate even though ask was
  perfectly configured.

### Tests (__tests__/llm.test.ts)

Two new tests:

1. `split provider: useAskModel routes to askChat backend when
   configured` — uses TWO fake servers, asserts bulk hits chat server
   only and ask hits askChat server only. Captures and asserts the
   `model` field in the request body of each, guarding against a
   regression where routing is right but the model id sent is wrong.

2. `legacy single-provider: useAskModel stays on chat backend, swaps
   model id` — asserts no-askChat config preserves prior behaviour
   (single backend handles both calls, model id swaps).

`FakeServer` extended with `lastChatBody` capture so tests can assert
which model id reached which server.

## Live validation

Tested on the codegraph self-repo with chat -> MLX/Qwen3-Coder and
askChat -> claude-bridge/Sonnet. Same ask question that Qwen3-Coder
confabulated about now returns Sonnet's precisely correct answer:
- Pre-split: 6.9s, wrong on PRAGMA foreign_keys direction, cited
  migration 014 (unrelated).
- Post-split: 39.5s, precisely correct including line numbers
  (`summarizer.ts:189-197`, `queries.ts:2125`) and the right
  migration filename (022-add-content-hash-index).

## Backwards compatibility

Existing configs with just `chat: { ..., askModel: "..." }` continue
to work unchanged — askChat is optional. Test #2 covers this path.

## Reviewer trail

Two passes. Pass 1: REQUEST_CHANGES + 4 findings.
- (1) handleAsk/handleDeadCode used getChatModel — addressed,
  switched to getAskModel with updated error messages.
- (2) status display omitted ask model — addressed, added
  conditional Ask model line in split-provider setup.
- (3) isReachable didn't probe askChat — addressed.
- (4) split-provider test didn't assert request-body model id —
  addressed.

Pass 2: APPROVE + 2 info findings (stale test comment, redundant
provider hint for single-provider askModel override) — both
addressed.

Suite: 1089 / 13 skip / 0 fail (was 1087).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv added a commit that referenced this pull request May 1, 2026
… docs

Three reviewer-flagged improvements on 6d0e7a2 (the WASM-visibility
commit). All informational items from that review — no functional
regression, no scope drift, suite stays at 1117/13/0.

## #1 — backend is now per-DatabaseConnection, not a process global

`createDatabase` previously set a module-level `activeBackend` and
exposed it via `getActiveBackend()`. In the MCP cross-project path
(handleStatus called against `projectPath` opens a SECOND DB via
openSync) the global reflected whichever DB was opened most recently,
not necessarily the one whose stats were just rendered. Benign in
practice (every DB in a process resolves the same backend) but
structurally imprecise.

Refactor:
- `createDatabase(dbPath)` now returns `{db, backend}` instead of a
  bare `SqliteDatabase`. Caller stores both.
- `DatabaseConnection` carries a `private backend: SqliteBackend` and
  exposes `getBackend()`.
- `CodeGraph.getBackend()` delegates — that's the public surface.
- CLI `codegraph status` and MCP `handleStatus` both call
  `cg.getBackend()` instead of the global. The global is removed.

Two pre-existing tests (`migrations-015-016`, `migrations-022`) that
called `createDatabase` directly now destructure `{db: adapter}`.

## #2 — fix recipe deduplicated across the two code surfaces

The `xcode-select` / `npm rebuild` / `npm install --save` recipe
appeared inline in both `buildWasmFallbackBanner` (sqlite-adapter.ts)
and the MCP `handleStatus` formatter (mcp/tools.ts). New
`WASM_FALLBACK_FIX_RECIPE` constant in sqlite-adapter.ts is the
single source for the one-line summary; the MCP formatter
interpolates it. The banner formats the same content multi-line for
the stderr surface. README is intentionally separate (different
audience, different rendering).

## #3 — README troubleshooting now covers Linux

Section title renamed "Indexing is slow on macOS / WASM fallback" ->
"Indexing is slow / WASM fallback active". New code block lists fix
steps for macOS, Debian/Ubuntu, RHEL/Fedora, and the cross-platform
`npm install --save` escape hatch. The banner stderr block also
gained the Linux equivalent for symmetry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv added a commit that referenced this pull request May 1, 2026
Both items came back APPROVE-with-info on the prior review and are
pure cleanup — no behavior change, no API surface change.

## #1 — RHEL/Fedora step in WASM_FALLBACK_FIX_RECIPE constant

The constant in `src/db/sqlite-adapter.ts` is documented as the
"single source of truth" for the fix recipe shown in the MCP
`Backend:` line, but it only listed macOS + Debian/Ubuntu paths. The
multi-line `buildWasmFallbackBanner` and the README both already
include the `yum groupinstall "Development Tools"` step for
RHEL/Fedora; the constant was the lone surface missing it. Now
appended so MCP-displayed guidance matches the other two surfaces
on every supported platform.

## #2 — hoist inline `import('./db').SqliteBackend` to top-level

`CodeGraph.getBackend()` was the only method in `src/index.ts` using
the inline-import type form. Adding `SqliteBackend` to the existing
`import { DatabaseConnection, getDatabasePath } from './db'` keeps
the file consistent. No circular-import risk since `./db` already
re-exports the type from `./db/sqlite-adapter`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv added a commit that referenced this pull request May 1, 2026
… server-config flags

Tooling-gap backlog (codegraph/docs/codegraph-tooling-gaps.md) closed:
  #1 freshness severity bucket — `classifyFreshness` with fresh|recent|stale|very_stale
  #2 allowStale flag — opt-in bypass for the heavy-drift gate, registry-injected schema
  #3 module format in status — `module-format.ts` parses package.json + tsconfig (JSONC-safe)
  #4 codegraph_imports tool + import-classifier — file/directory/bare/unresolvable filters
  #5 dynamic imports — extractor catches `import('…')` + `require('…')`, incl. template_string
  #6 build-context refs — new `build_context_refs` table for `__dirname` / `import.meta.*`
  #7 files.is_test flag — column populated by glob; surfaced in status as `(N test)`
  colbymchenry#11 summarize-also-embeds (discovered while dogfooding) — `cg.summarizeAll()` chains
       `embedAllSummaries`; new `cg.embedAll()` for embed-only path; CLI `codegraph embed`

CLI/MCP alignment (5/32 → 33+/35):
  - 13 new CLI commands via `runViaMCP` shim: callers, callees, impact, node, similar,
    biomarkers, imports, help-tools, explore, hotspots, dead-code, config-refs, sql-refs,
    module-summary, role, coverage-query, pending-summaries, save-summaries, review-context
  - 7 new MCP tools: codegraph_imports, codegraph_embed, codegraph_summarize, codegraph_sync,
    codegraph_reindex, codegraph_coverage_ingest, codegraph_init, codegraph_uninit,
    codegraph_unlock, codegraph_affected

MCP server-level operator config (`codegraph serve --mcp`):
  - --no-write-tools / --allow-stale-default / --disable-tool (sandboxing)
  - --llm-endpoint / --llm-chat-model / --llm-ask-model / --llm-embedding-model /
    --llm-api-key (operator LLM config; per-project config wins on conflict)
  - New CODEGRAPH_LLM_* env vars wired through `mergeLlmEnv` in resolveLlmProviders

Architectural cleanups:
  - `bypassFreshnessGate` and `isWriteTool` declarative flags on ToolModule (replaces
    growing string-comparison chain in execute())
  - `withAllowStale` registry injection only on tools that DO see the gate
  - DRY of inline copy-paste in 3 hooks → `src/index-hooks/enclosing.ts`
  - `LlmClient.isEmbeddingReachable` for split-provider correctness
  - SyncResult `lockContention` flag → handleSync emits distinct retryable message
  - `clearStructural` deletes from build_context_refs (was orphan-leaking on --force)
  - cli:dev npm script + tsx CLI fixed (web-tree-sitter `import type` for type-only refs)

Migrations:
  023-files-is-test.ts — add `files.is_test`
  024-build-context-refs.ts — add `build_context_refs` table

Reviewer rounds: 11 total, all REQUEST_CHANGES addressed inline. Notable fixes:
  - JSONC URL strip via state machine (was eating `https://` tails)
  - classifyFreshness very_stale now requires isStale (in-sync-but-old → recent)
  - Dynamic imports also match template_string nodes
  - process.exit deferred until after finally cleanup in runViaMCP
  - --same-language / --different-language mutual exclusion guard
  - help-tools CLI bypasses isInitialized (works without a project)
  - handleUninit sweeps projectCache by getProjectRoot (no dangling alias leaks)
  - handleAffected errors instead of silently dropping unsupported glob filters
  - mergeLlmEnv preserves precedence: legacy flat config wins over env-synthesised block

Suite: 1268 passing, 1 expected red (colbymchenry#8 — undecided), 13 skipped, 1 todo, 0 regressions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv added a commit that referenced this pull request May 3, 2026
upsertSymbolSummary and upsertSymbolEmbedding fired SqliteError:
FOREIGN KEY constraint failed when a long-running write — the
summarizer mid-LLM-call, the embedder mid-batch, the MCP server
mid-tool — held a node id across a sync that deleted the symbol.
Repro: capture id, delete the source file, run sync, attempt the
upsert. Live race for the MCP path because the freshness gate
auto-syncs before each tool but in-flight LLM/embedding requests
straddle that boundary.

The fix
-------
Atomic SQL guard at the upsert layer. Each upsert switches from
`INSERT ... VALUES` to `INSERT ... SELECT ... WHERE EXISTS (...)`
gated on the FK target — `nodes` for the summary, `symbol_summaries`
for the embedding (the actual FK target; nodes-cascade keeps the
two in sync). When the parent is gone the SELECT yields zero rows,
the INSERT does nothing, ON CONFLICT remains the re-upsert path
when the parent IS present. Single-statement, race-free even
cross-process — there's no SELECT-then-INSERT window for a
concurrent delete to slip into.

Both helpers now return `boolean` (true wrote, false skipped) so
callers can keep counters accurate. Summarizer + embedder no longer
increment `generated`/`cacheHits` on stale-skip; the existing
`skipped = candidates - generated - errors - cacheHits` derivation
absorbs them naturally without polluting `errors` (which is reserved
for real LLM/network failures). saveAgentSummaries reports a
mid-write disappearance via the existing skipped/errors trail.

Tests
-----
__tests__/fk-stale-handle.test.ts — 4 cases: stale-id summary upsert
no-ops (was FK-failing), stale-id embedding upsert no-ops, normal
write still works for both. Confirmed the first test fails with
SqliteError: FOREIGN KEY constraint failed on the unfixed code by
temp-reverting the WHERE EXISTS clause.

Suite: 1365/13/0 (was 1361/13/0; +4 new). tsc clean.

Reviewer pass found two issues, both addressed before commit:
- agent-bridge.ts was discarding the new boolean — counters could
  overcount on cross-process race; now captures + reports skip.
- The first draft incremented `errors` on stale-skip; reviewer
  flagged the semantic overload (errors is for real failures).
  Resolved by not incrementing any outcome counter — the derived
  `skipped` metric already covers it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv added a commit that referenced this pull request May 3, 2026
Second commit of the trace-logging arc. Wraps every
toolHandler.execute() call in a TraceLogger.log() so the call
flows into the mcp_tool_calls table that landed in 027139b.
The viewer's Agent-trace tab (commit C) reads it back.

Lifecycle:
- TraceLogger is created lazily once `cg` is open (via
  ensureTraceLogger() called on each dispatch). Skipped entirely
  when --no-write-tools is passed: the spirit of that flag is "no
  DB writes", and trace logging is a write path.
- log() is contractually best-effort (DB failures swallowed at
  debug). The dispatch site additionally wraps it in try/finally
  so even a contract violation can never strand the tool result —
  sendResult always fires.

Reviewer-memo gates passed:
- #1 docstring rot: ensureTraceLogger and the dispatch comments
  match the implementation.
- #2 best-effort claim is upheld at both layers (logger try/catch
  + caller try/finally).
- #5 N/A — this commit doesn't add tables.

The 3-line wiring isn't separately tested; TraceLogger has 10
round-trip tests covering every behavior the wiring composes,
and the wiring has no branches. Suite 1422/34/0 unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv added a commit that referenced this pull request May 3, 2026
codegraph_node now always emits **Lines: N** when the indexed range is
known, and a new detail: 'preview' | 'full' arg controls how a fetched
body is rendered. Default preview truncates bodies >40 lines to the
first 30 lines plus a tail marker that names the override; full
preserves the prior verbatim behavior.

Lets agents skip a code: true round-trip when **Lines:** alone is
enough, and bounds the response when fetching a large symbol —
backlog #2 (smart source-snippet truncation). CLI mirror gains
--detail to match.

Reviewer-flagged: tail-marker total now prefers locFromRange over
the body's split count so the two numbers always agree.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv added a commit that referenced this pull request May 4, 2026
Resumes the QueryBuilder → per-domain queries-*.ts migration that was
partially done across earlier waves. This batch handles the centrality
domain.

Extracted to `src/db/queries-centrality.ts`:
- `applyCentralityScores(qb, scores)` — wraps the per-row UPDATE in a
  single transaction + clears the node cache.
- `clearCentrality(qb)` — UPDATE nodes SET centrality = NULL + clears
  cache.

Both follow the established pattern (free function taking `qb`; uses
`qb.db` directly + `qb.clearCache()` for cache invalidation).

Removed entirely (per reviewer-memo §7 / dead exports):
- `getTopNodesByCentrality(opts)` — added speculatively, zero in-tree
  callers.
- `getCentralityRank(nodeId)` — same. Both can be re-added with a
  concrete caller in the same diff if a future feature needs them.

Caller updated:
- `src/index-hooks/centrality.ts` switches from
  `ctx.queries.applyCentralityScores(...)` / `ctx.queries.clearCentrality()`
  to the imported free functions.

QueryBuilder shrinks by 4 methods. Suite 1634/34/0 unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv added a commit that referenced this pull request May 4, 2026
Continues the QueryBuilder → per-domain queries-*.ts migration.
Metadata domain (4 methods, ~25 call sites across 9 files) ships in
this commit.

Extracted to `src/db/queries-metadata.ts`:
- `getMetadata(qb, key)` — read project_metadata
- `setMetadata(qb, key, value)` — upsert
- `getAllMetadata(qb)` — full snapshot
- `getStaleArtifactsCount(qb)` — derived rollup of summary/embedding/
  finding rows whose source_content_hash is behind the file's current
  content_hash. Used by the freshness gate.

All four are pure DB ops on `project_metadata` / joined views — no
per-instance caches, no shared statement slots — so the extraction is
mechanical.

Caller updates (9 src files + 6 test files):
- src/freshness.ts (2 sites)
- src/biomarkers/index.ts (2 sites)
- src/mcp/tools/status.ts (1 site)
- src/viewer/server.ts (2 sites)
- src/index-hooks/{churn,centrality,cochange,issue-history}.ts (12 sites)
- src/extraction/index.ts (7 sites)
- __tests__/{cochange,freshness,freshness-stress,freshness-v2-stress,
  issue-history,churn}.test.ts (8 sites)

QueryBuilder shrinks by 4 methods. No shims kept (the migration goal
is to actually shrink the class, not paper it over). Suite 1634/34/0
unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv added a commit that referenced this pull request May 8, 2026
…l failure

Stress test surfaced 224 nodes in the live index reporting as
"embed errors". Investigation revealed only 7 of those had inputs
exceeding the user's `llama-server --batch-size 512` cap — the
other 217 were collateral damage. When a batch of 32 contained one
or two over-length inputs, the WHOLE batch failed and every row
in it was marked `errored`. So a 7-row capability miss inflated
into a 224-row "errors" report and 217 healthy rows never got
embedded.

Fix: when `client.embed(batch)` fails with an "input too large /
exceeds batch size / context length" server error, fall back to
per-row retry. Each row is embedded individually; rows that still
fail get counted as `skipped` (NEW counter, separate from
`errors`) since they're a server-capability miss the user can fix
by increasing the embed server's --batch-size, not a pipeline
failure. Rows that succeed in the per-row pass embed normally
through `upsertSymbolEmbedding`, including the new
`summary_hash_at_embed` value from migration 035.

Result on the live codegraph index:
- Before:  0 generated / 224 errors / 0 skipped (ambiguous)
- After:   355 generated / 0 errors / 7 skipped (precise)

Also updated counter accuracy per the reviewer-memo's recurring
scrutiny area #2: "errors" no longer bumps for server-capability
misses; the new `skipped` counter covers those, and the CLI / MCP
embed-tool output surfaces an actionable hint ("increase
--batch-size on the server to embed these"). EmbedResult /
EmbedResultRow types gain `skipped: number`.

`isInputTooLargeError` regex matches the llama.cpp variant
(`too large to process`) plus generic shapes (`exceeds batch
size`, `input length exceeds`, `context length`) so other
server backends with similar caps trip the same fallback.

+1 test in embed-all-nodes.test.ts: the new size-capped fake
server returns the exact llama.cpp error shape; the test asserts
`errors=0`, `skipped>0`, `generated>0`, and
`candidates == generated + skipped`. Suite 1727/34/0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv added a commit that referenced this pull request May 8, 2026
…arch arc)

Three additions inspired by the 2026-05-08 code-KG paper sweep
(CodeRAG, GraphCodeAgent, GraphGen4Code, Maarleveld GNN survey,
CodexGraph, FalkorDB). Each verified absent before implementation;
all opt-in to keep default structural traversals unchanged.

#1 — In-graph similar_to edges (similarity as graph hops)
  - New EdgeKind similar_to (confidence=INFERRED, metadata.score)
  - buildSimilarToEdges reuses findSimilarViaVec; delete+insert are
    wrapped in db.transaction so the replacement is atomic
  - Migration 036 partial-index on edges(source,kind) WHERE
    kind=similar_to — guarded by sqlite_master existence check for
    the pre-016 hand-rolled migration tests
  - Surfaced as codegraph_admin({action: build-similarity-edges})
    and CLI codegraph admin build-similarity-edges --k --min-score
  - EXCLUDED_EDGE_KINDS keeps it out of default traversals; explicit
    edgeKinds bypass the filter

#2 — mode=intent search over symbol_summaries
  - FTS5 virtual table summary_fts (porter unicode61, mirrors nodes_fts)
    + INSERT/UPDATE/DELETE triggers on symbol_summaries
  - Migration 037 + parallel schema.sql entry for fresh-init path
  - bm25-ranked, optional kind/language/pathFilter filters
  - pathFilter LIKE uses canonical backslash-escape pattern matching
    queries-findings.ts:227-232 (no _/% injection)
  - Refuse-when-empty error points at codegraph summarize
  - FTS5 query parse errors caught and re-surfaced as a clear
    syntax-error message

#3 — Intra-procedural def_use edges (TS/JS/TSX/JSX)
  - New EdgeKind def_use as self-loops on function/method nodes;
    metadata carries name, defLine, useLines
  - Scope-bounded extractor in src/extraction/def-use.ts, called
    from both tsExtractFunction and tsExtractMethod
  - Skips parameters (function inputs), fields (covered by
    field_access), nested-scope vars (belong to inner function set)
  - EXCLUDED_EDGE_KINDS opt-in; no traversal helper assumes
    source != target

Schema-version assertions bumped 35 to 37 in foundation.test.ts and
pr19-improvements.test.ts.

Suite 1742/0/34 (was 1729 baseline, +13 new tests). Three reviewer
rounds: round 1 caught LIKE-escape, atomicity, field rename, FTS5
try/catch; round 2 caught a JSDoc rot from the rename and a
contradictory test assertion; round 3 APPROVE.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv added a commit that referenced this pull request May 8, 2026
…ge-level diff (eval arc)

Three additions with pre-set hypotheses + post-impl measurement, per
the user's "evaluation and measurement how much useful it will
actually be" brief. Pre-impl thresholds were committed before code so
post numbers couldn't be motivated. One feature deferred with honest
documented reasoning.

#1 Selective parse-cache invalidation
  - clearParseCache(qb, language?) returns deleted-row count
  - CLI: --clear-parse-cache [language] (boolean OR optional value)
  - MCP: clearParseCache: boolean + clearParseCacheLanguage: string
    (typed schema doesn't allow oneOf — split into two args; language
     wins when both set)

  Hypothesis (pre-set): >=3x wall-clock speedup, <30s absolute
  Measured (codegraph repo, 463/498 = 93% TS in parse cache):
    - TS-only clear: 3.70s wall / 3.77s user-CPU
    - Full clear:    3.87s wall / 8.13s user-CPU
    - Wall ratio: 1.04x (parallelism masks the work delta)
    - User-CPU ratio: 2.16x more work for full clear
  Verdict: speed threshold NOT MET on monoglot testbed. Real value
  here is correctness (targeted invalidation when an extractor
  changes). On polyglot repos at 50% target-lang ratio, expected ~2x
  wall-clock speedup.

#1.5 Docstring source for mode='intent' (user follow-up: "make intent richer")
  - Migration 038: docstring_fts FTS5 over nodes.docstring + INS/UPD/DEL
    triggers (with WHEN docstring IS NOT NULL AND != ''); schema.sql parity
    for fresh-init path; pragma_table_info guard for pre-016 hand-rolled
    migration test setups
  - _search-intent.ts queries BOTH summary_fts AND docstring_fts,
    UNIONs by node_id keeping best rank, surfaces 'via summary' /
    'via docstring' provenance label per result
  - Empty-corpus check fixed: FTS5 external-content COUNT(*) reads
    from the source table, not the actual indexed rows — switched to
    direct content count

  Hypothesis (pre-set): >=30% recall increase
  Measured (20 hand-picked intent queries, codegraph corpus):
    - Summary-only hits: 22
    - Docstring-only hits: 34
    - Combined unique-node hits: 56 (2.55x = 155% improvement)
  Verdict: well above threshold. Best ROI of this arc — docstrings
  cover 26% of nodes vs summaries' 18%, AND describe intent verbatim
  in JSDoc / Python docstrings / Go comments.

#2 Edge-level diff in compare_to_ref
  - EdgeDelta / EdgesDelta types
  - diffEdgeLists keyed by stable
    (srcQualName::srcKind=>tgtQualName::tgtKind::edgeKind) so line
    shifts don't surface as spurious changes
  - Cross-file edges out of scope (compareToRef is per-file)
  - Opt-in via includeEdges: true (CLI: --include-edges)
  - Renderer surfaces source -> target node IDs (round-1 reviewer
    finding: discarded data; fixed)

  Hypothesis (pre-set): >=30% additional info (>=20% loose)
  Measured (HEAD vs HEAD~3 on this branch):
    - Node changes: 21+11+308 = 340
    - Edge changes (NEW signal): 83+11 = 94
    - Files surfaced: 22 -> 27 (+5 visible only via edge changes)
    - Information gain: 94/340 = 27.6%
  Verdict: >=20% threshold MET; just below >=30% strict. The 5
  newly-surfaced files (pure-edge changes) are the qualitative win.

#3 Stack-graphs cross-language resolver — DEFERRED
  Survey of the codegraph corpus: monoglot TypeScript. child_process
  invocations are ~30 git execFileSync calls; no Python/Ruby/Go
  spawn targets. Dynamic imports are NPM packages. string_imports
  table is dominated by test fixtures. Conclusion: this corpus
  lacks the ground-truth cross-language references needed to
  measure a scope-graph rule meaningfully. Building infrastructure
  without testable signal would be speculative abstraction (CLAUDE.md
  anti-pattern). Stays on the borrow-ideas backlog as the
  long-horizon item; not blocked, just not session-feasible without
  a polyglot testbed.

Schema-version assertions bumped 37 -> 38 in foundation.test.ts and
pr19-improvements.test.ts.

Suite 1746/0/34 (was 1742; +4 new tests). Two reviewer rounds: round
1 caught the edge-delta formatter discarding source/target IDs;
round 2 APPROVE. Info-level note tracked for later: summary_fts_au
trigger could mirror the docstring_fts_au SELECT-WHERE guard pattern
for consistency.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv added a commit that referenced this pull request May 8, 2026
…ulk codegraph_at_range

Three additive surface extensions to cut investigation round-trips:

#1 codegraph_node accepts `symbols: string[]` (up to 20) alongside
   the existing single-symbol `symbol`. Duplicate inputs that resolve
   to the same node are merged. Saves N round-trips when checking a
   list of suspect symbols.

#2 codegraph_node accepts four new inline-expansion flags
   (`includeCallers` / `includeCallees` / `includeBiomarkers` /
   `includeTests`). When set, the response folds in the corresponding
   tool's answer under each card, capped per-section to keep token
   pressure low (10 callers, 10 callees, 5 findings, 5 test files).
   Collapses 3-5 round-trips into one for "tell me everything about X"
   patterns; the dedicated tools remain available for the full lists.

#3 codegraph_at_range accepts `ranges: Array<{file, startLine, endLine}>`
   (up to 100) alongside the single-range form. Output renders one
   subsection per range so the agent can map results back to specific
   diff hunks. PR review with N hunks goes from N+1 calls to 1.

All three paths are additive — the legacy single-input shapes are
preserved verbatim. Backward-compat is locked in by the existing tests
plus 19 new ones (8 for node multi/expansions, 5 for bulk at-range,
+6 from refactoring). Docs updated in CLAUDE.md, README.md, and the
server-instructions playbook.

Suite 1772/0/34. No schema migration.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv added a commit that referenced this pull request May 9, 2026
New module src/llm/commit-intent.ts. Classifies a commit message
into one of 8 buckets: feat / fix / refactor / perf / test / docs /
chore / unknown. Heuristic-first, no ML model.

Public API:
- classifyCommitMessage(message) → { intent, score, reason }
- 7 conventional-commits prefixes (feat:, fix:, refactor:, perf:,
  test:, docs:, chore: / build: / ci:) → score 0.95
- Keyword cues for unprefixed messages (add/implement → feat,
  fix/resolve → fix, refactor/rename → refactor, etc.) → score 0.6
- Body-cue fallbacks ("Closes #N" → fix, "BREAKING CHANGE:" → feat)
- Default unknown when nothing matches

45 vitest cases. Friction notes: prefix character class tightened to
`[\(:]` so "Fix the token..." doesn't match the `fix:` prefix rule;
priority ordering documented inline.

Integration into co_changes / codegraph_history / codegraph_hotspots
to surface intent breakdown lands separately (tracked as Stage 7 #2
follow-up — needs a small migration to add intent column to co_changes).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv added a commit that referenced this pull request May 9, 2026
…#2)

Wires the heuristic commit-intent classifier (shipped in d12ee01) into
a SHA-keyed persistence layer:

- Migration 045: new `commit_intents (sha PK, intent, score, seen_at)`
  table with idx_commit_intents_intent for per-intent queries.
- schema.sql parity (recurring scrutiny pattern #5).
- Schema-version test assertions bumped 44 → 45.
- src/db/queries-commit-intents.ts (new): recordCommitIntent /
  recordCommitIntents (batch upsert with txn) / getCommitIntent /
  getCommitIntents (multi-SHA Map fetch) / aggregateIntentBreakdown
  (returns Record<intent, count> for codegraph_history-style folds) /
  clearCommitIntents (cochange full-rescan path).

Mining-side integration deferred — `mineCoChanges` doesn't currently
emit subjects (only SHAs + file pairs). Adding a subject-capture pass
is tracked as a Stage 7 #2 follow-up.

Suite remains 2374/0/34. Typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv added a commit that referenced this pull request May 10, 2026
…ge 7 #2)

Closes the Stage 7 #2 loop end-to-end: subjects flow from git-log
through to classified intents in commit_intents.

- src/cochange/index.ts: git-log format extended to
  `tformat:CGCMT-%H%x09%s` (TAB-separated SHA + subject); parser
  captures subjects per commit; mineCoChanges return shape gains a
  `subjects: Map<sha, subject>` field.
- src/index-hooks/cochange.ts: applyResults now classifies every
  freshly-mined subject via classifyCommitMessage and batch-upserts
  to commit_intents. Full-rescan path also clearCommitIntents to drop
  stale SHAs after a force-push / rebase.

Heuristic-only classifier — no LLM call, runs inline with mining.
The persistence layer (Stage 7 #2 foundation, 6af1924) stays
unchanged; this commit just produces the input rows.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv added a commit that referenced this pull request May 10, 2026
…Stage 7 #2/#3 follow-ups)

Two new local-NLI surfaces. Both use the existing bart-large-mnli
zero-shot classifier; both keep the heuristic / rule-based path
on the happy case and only consult the model when the deterministic
rules can't reach a confident answer. Token cost vs the chat
backend: ~0 per call.

# C — commit-intent NLI fallback

`classifyCommitMessageWithFallback(message, classifier?)` runs the
existing heuristic first; when it returns `'unknown'` (~30% of
commits in messy histories: "wip", "stuff", "more"), feeds the
subject into a 7-hypothesis NLI classifier:

  feat     → "a new feature or capability"
  fix      → "a bug fix or error correction"
  refactor → "code restructuring or cleanup without behaviour change"
  perf     → "a performance improvement or optimisation"
  test     → "tests, specs, or test infrastructure"
  docs     → "documentation, comments, or readme"
  chore    → "dependency bumps, build config, or routine chores"

Confidence floor 0.45 — below that we keep `'unknown'` rather than
ascribe a low-confidence label (avoids polluting commit_intents
with junk on truly-opaque commits). Wired into the cochange index
hook: classifier is constructed from `localRoleLlm` config when
present, otherwise the hook stays heuristic-only and behaviour-
identical to before.

The original sync `classifyCommitMessage` is unchanged — existing
callers + tests continue to use the heuristic path.

# D — new structured change-kind classifier

`classifyChangeKind({classifier, beforeBody, afterBody, name, kind})`
in `src/llm/change-kind.ts`. Distinct from the existing
generative `summarizeChange` (which produces prose via the chat
backend): this produces a STRUCTURED label suitable for grouping /
filtering / metrics:

  addition | removal | modification | refactor
  | signature_change | behavioral_change | doc_only | unknown

Rule-based dispatch handles the trivial cases (empty before →
addition, empty after → removal, identical → unknown) without an
NLI call. The remaining cases consult bart-large-mnli with 4
prose hypotheses against the diff. Same 0.45 confidence floor;
sub-threshold tops fall through to `'modification'`.

# Supporting refactors

- `LocalRoleClassifier.classifyLabels(text, labels)` — new method
  that runs zero-shot against an arbitrary caller-supplied label
  set (vs the existing `classify()` which is hardcoded to the
  7-class role taxonomy). The classifier wraps the same pipeline,
  so both surfaces share one model load.

- `IndexHook` afterIndexAll/afterSync are already async-aware in
  the registry; the cochange hook had been synchronous. Made
  `applyResults`/`applyFullRescan`/`applyIncremental`/`refresh`
  async so the NLI fallback can await per-commit. Behaviour
  unchanged when `localRoleLlm` is unset.

# Tests

+ `__tests__/commit-intent-fallback.test.ts` — 13 cases covering
  short-circuit, NLI dispatch, low-confidence floor, error
  degradation, all 7 intent labels.
+ `__tests__/change-kind.test.ts` — 12 cases covering rule-based
  dispatch, NLI dispatch, low-confidence floor, error paths.
+ `__tests__/local-role-classifier.test.ts` — 7 new cases for
  `classifyLabels` (custom label set, short-circuit, abort,
  unknown labels NOT coerced to ROLE_LABELS).

# Why now

Confirmed by web research (HF API + WebSearch agent) that the
in-process transformers.js model surface is "five small specialized
models is the realistic ceiling": no code-tuned summarizer or
embedder ships in transformers.js-loadable form today. The biggest
remaining win for the existing surface was unifying everything that
used to chat-call onto the one bart-large-mnli load — both C and D
fit that shape.
andreinknv added a commit that referenced this pull request May 10, 2026
…L rebuild script

Three deliverables from the gap-B speed/size investment session:

1. `scripts/spikes/qwen-coder-bench.mjs` — multi-dtype + multi-device
   bench harness for Qwen2.5-Coder-0.5B-Instruct loaded in-process
   via @huggingface/transformers. Probes the locally-converted
   ONNX (fp32 / q8 / etc.) on the same 5 curated codegraph snippets,
   reports per-dtype median/p95 latency + side-by-side outputs.

2. `src/llm/local-causal-llm-client.ts` — new client class that
   wraps the `text-generation` pipeline + chat templating for
   small instruction-tuned causal LMs. Sibling to LocalSummaryClient
   (encoder-decoder seq2seq) and RerankerClient (cross-encoder).
   Default model intentionally empty — operators must supply a
   re-exported HF id (the recipe is in the qwen-coder-probe.mjs
   commit message). Not yet wired into the summarizer phase; lands
   alongside the integration commit.

3. `scripts/spikes/coreml-rebuild.sh` — recipe to rebuild
   onnxruntime-node from source with --use_coreml so transformers.js
   gets `device:'coreml'` on Apple Silicon. The pre-built npm
   binary doesn't bundle CoreML EP. Vendoring the rebuilt .node
   file into node_modules is the deployment path.

# INT8 quantize bench result (today's deliverable)

ORT dynamic INT8 quantize via `optimum-cli onnxruntime quantize
--arm64 --per_channel` produces `model_quantized.onnx` (~605MB,
4× smaller than fp32 2.3GB). Bench against fp32 baseline:

| Config       | Cold load | Median per-call | Quality |
|--------------|-----------|-----------------|---------|
| cpu / fp32   | 2707ms    | 271ms           | clean   |
| cpu / q8     |  733ms    | 273ms           | regressed |

Two findings:

- **q8 is not faster per-call.** Only cold-load and disk shrink (4×
  each). NEON fp32 is well-optimised in onnxruntime CPU EP;
  dynamic INT8 reduces memory but doesn't speed compute on this path.

- **q8 regresses instruction-following.** sum() produces a
  re-generation of the source code instead of a summary;
  classifyCommitMessage produces "To summarize the provided code,
  I'll break it down..." instead of the requested 1-line.
  Classic small-instruct-LLM failure mode under dynamic
  quantization — precision loss in attention layers degrades
  attention to system-prompt formatting constraints.

Decision: **do not ship q8**. Keep fp32 as the production path
despite the 2.3GB footprint; quality is what matters. Real speedup
needs CoreML EP (next task #2).

# Probe script update

`scripts/spikes/qwen-coder-probe.mjs` got a tighter system prompt
(`OUTPUT FORMAT: ONE LINE...`) and `max_new_tokens=30` cap that
together produce clean 1-liners on fp32:

  - sum                    → "Sum an array of numbers."
  - FileLoader.load        → "Load file from path"
  - buildSimilarToEdges    → "Build Similar To Edges"
  - parseDaRecord          → "Parse DA record from a line and update coverage."
  - classifyCommitMessage  → "Classify commit message"

Same prompt + cap on q8 fails (model ignores OUTPUT FORMAT).

# Out of scope (next task)

CoreML EP rebuild — `scripts/spikes/coreml-rebuild.sh` is the
recipe; expected ~3-5× speedup over CPU fp32 on Apple Silicon
once the binding is vendored. Documented but not run in this
commit.

# Footprint

5 files added/modified, ~600 LOC:
- 2 spike scripts (bench, mlx-vs-onnx)
- 1 rebuild script (coreml)
- 1 client class (LocalCausalLlmClient — wired in a follow-up commit)
- 1 probe script update (prompt tightening)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant