Skip to content

Add slice-v1 retrieval and benchmark tooling#144

Merged
mohanagy merged 2 commits into
mainfrom
payoff-v0-22
May 11, 2026
Merged

Add slice-v1 retrieval and benchmark tooling#144
mohanagy merged 2 commits into
mainfrom
payoff-v0-22

Conversation

@mohanagy
Copy link
Copy Markdown
Owner

@mohanagy mohanagy commented May 11, 2026

Summary

  • add opt-in slice-v1 retrieval plus MCP/CLI surface support and regression coverage
  • extend resolution: 'sketch' with deterministic env/config and side-effect hints
  • add real-workspace benchmark tooling, graph stats helpers, and value-per-token calibration reporting

Verification

  • npm run typecheck
  • npm run build
  • npm run test:run
  • bash docs/benchmarks/2026-05-11-spi-vs-legacy/run.sh

Benchmark notes

  • bundled fixture still does not show token reduction: legacy 947 vs SPI cold 1002
  • bundled fixture still shows smaller graph output: 67,254 bytes -> 45,589 bytes
  • calibration stayed honest on the fixture: 0 helps / 7 no material change / 0 hurts-or-expands

Notes

  • slice-v1 is opt-in and experimental
  • real-workspace benchmark flow uses env vars only; no private paths or artifacts are committed

Summary by CodeRabbit

  • New Features

    • Opt-in task‑conditioned slicing retrieval mode ("slice‑v1") exposed in CLI and stdio tools with validation.
    • Sketch mode improved with deterministic env/config reads and side‑effect/latency hints.
    • Real‑workspace benchmark flow: runner, bundling, and consolidated reporting with node/edge metrics and calibration buckets.
  • Documentation

    • Updated benchmark reporting, reproducibility docs, example prompts, and a report template for real‑workspace comparisons.

Review Change Stack

Add opt-in slice-v1 retrieval, richer sketch semantics, real-workspace benchmark helpers, and calibration/diagnostic coverage for the post-v0.21 payoff layer.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 11, 2026

📝 Walkthrough

Walkthrough

This PR adds an opt-in task-conditioned retrieval mode "slice-v1" that anchors and traverses a KnowledgeGraph to produce ordered slice node lists and metadata, exposes retrieval_strategy/slice through CLI/stdio/command paths with validation, enriches sketch output with deterministic env/side-effect hints, and introduces real-workspace benchmark tooling (edge-counts, multi-workspace runner/summarizer, probe calibration) plus tests and docs.

Changes

Task-Conditioned Slice-v1 Retrieval Strategy

Layer / File(s) Summary
Slice Metadata & Contracts
src/contracts/context-pack.ts
Adds ContextPackRetrievalStrategy union and slice metadata interfaces; CompiledContextPack extended with optional retrieval_strategy and slice.
CLI Parser
src/cli/parser.ts
Adds PackCliOptions.retrievalStrategy, updates usage/help, parses `--retrieval-strategy [default
Context-Pack Command Forwarding
src/infrastructure/context-pack-command.ts
Expands retrieveContext options typing to accept retrievalStrategy; runContextPackCommand forwards it in non-review tasks and rejects it for task=review.
MCP Tool Schema
src/runtime/stdio/definitions.ts
Adds retrieval_strategy property to retrieve and context_pack tool input schemas with enum ['default','slice-v1'] and adjusts descriptions.
Stdio Handlers
src/runtime/stdio/tools.ts
Adds parseRetrievalStrategyParam helper; validate retrieval_strategy param, reject invalid values with jsonrpcInvalidParams, disallow for task=review, and forward validated retrievalStrategy into retrieval calls.
Slicing Policy & Algorithm
src/runtime/retrieve/slicing.ts
New SliceScoredNode shape and sliceCandidatesForRetrieve implementing anchor selection (up to 2 anchors), intent→policy mapping, barrel suppression, BFS traversal with relation allowlists, helper neighbor expansion, path recording, and returning ordered_ids + metadata or null.
Retrieve Integration
src/runtime/retrieve.ts
RetrieveOptions gains retrievalStrategy; RetrieveResult includes retrieval_strategy and slice; adds scoredNodeFromGraph; implements slice-v1 branch in retrieveContext using sliceCandidatesForRetrieve, preserves lexical slice during semantic rerank by filtering semantic IDs, injects slice metadata into results, and propagates strategy/slice in compact outputs and compiled packs.
Deterministic Sketch Emission
src/runtime/context-pack-resolution.ts
Adds internal sideEffectHints helper; separates readsEnv from config, computes side-effect and latency-sensitive hints, expands behavior_sketch emission condition, and appends reads env:, side effects:, and latency-sensitive: lines when present.
Slice-v1 Unit Tests
tests/unit/retrieve-slice-v1.test.ts
Tests explain/debug/impact slice modes using a deterministic buildSliceGraph fixture; asserts anchor-bounded selection, barrel suppression, direction-aware traversal, and presence of slice metadata fields.
Surface & Integration Tests
tests/unit/retrieve-slice-surface.test.ts, tests/unit/stdio-slice-surface.test.ts
Tests CLI/pack parser accepts/rejects --retrieval-strategy, runContextPackCommand behavior for explain/reject for review, stdio retrieve/context_pack validate/forward/disallow strategy, and temp-graph fixture setup/cleanup.
Sketch Enhancement Tests
tests/unit/context-pack-resolution-sketch.test.ts
Adds tests verifying behavior_sketch includes deterministic reads env and config markers, side-effect and latency-sensitive hints for inferred HTTP/LLM/DB operations, and that contains-only relationships do not spuriously infer side-effects.

Real-Workspace Benchmark Infrastructure

Layer / File(s) Summary
Changelog & README
CHANGELOG.md, README.md
Document Unreleased slice-v1, deterministic sketch changes, safe CLI/MCP exposure, and updated README benchmark wording (removes generic pack-token claim; reports framework correctness, retrieval expansion, graph size −32%, cache-hit rebuild −27%, and no explain-pack token win for fixture).
Graph Stats CLI
docs/benchmarks/.../graph-stats.mjs
New Node.js CLI that reads a graph.json path, safely parses JSON, computes node_count and edge_count (0 defaults), and prints JSON to stdout; usage/parse exit codes implemented.
Runner: edge_count integration
docs/benchmarks/.../run.sh
run.sh now uses graph-stats.mjs for node/edge counting across legacy/spi-cold/spi-warm, writes edge_count into per-variant JSON, logs edges, and uses timestamped RESULTS_DIR with optional override.
Probe Extensions
docs/benchmarks/.../probe.mjs
Adds resolution token estimates and top_files, runs additional retrieval with retrievalStrategy: 'slice-v1' (value-per-token selection), records slice deltas, classifies calibration buckets (helps/hurts_or_expands/no_material_change), and includes calibration in final output.
Real-Workspace Orchestration
docs/benchmarks/.../run-real-workspace.sh, summarize-real-workspaces.mjs
run-real-workspace.sh validates prompts/workspace inputs, runs per-workspace run.sh with env forwarding into timestamped results bundle; summarize-real-workspaces.mjs orders workspaces (backend, monorepo, lexicographic fallback), reads/parses per-workspace summary.json, flattens objective_metrics (build_time_ms, graph_size_bytes, node_count, edge_count), and emits aggregated JSON with qualitative notes.
Prompts & Report Template
docs/benchmarks/.../prompts.real-workspace.example.json, REAL_WORKSPACE_REPORT_TEMPLATE.md
Adds example prompts JSON (schema_version:1 and typed prompt entries) and a report template covering workspace matrix, strategy/resolution comparisons, retrieval-level comparisons, calibration checklist, and privacy/disclaimer guidance.
Benchmark README
docs/benchmarks/.../README.md
Documents that variant JSON includes edge_count and documents the Real-workspace matrix runner (env vars, defaults, produced artifacts).
Benchmark Tests
tests/unit/benchmark-graph-stats.test.ts, tests/unit/benchmark-real-workspace.test.ts, tests/unit/benchmark-probe-calibration.test.ts
Adds unit tests for graph-stats (edges present, edges omitted, malformed JSON error), integration-style tests for summarize-real-workspaces ordering/metrics and fail-fast validations, prompt/template schema checks, and calibration classifier tests.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

"🐰 I hop and mark each anchored node,
I trace the paths where slices roam,
I count the edges, guard private code,
My sketches hum of env and home.
Benchmarks gather workspaces near—
A tidy slice, the result is clear."

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 2.56% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main changes: adding slice-v1 retrieval and benchmark tooling, which aligns with the core objectives.
Description check ✅ Passed The PR description covers all required template sections with concrete details: summary lists three distinct changes, verification includes four specific commands, testing checkboxes are shown, and focused scope is documented.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch payoff-v0-22

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/infrastructure/context-pack-command.ts (1)

221-244: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

--retrieval-strategy is ignored for review packs.

The new flag is forwarded for explain/impact only. runContextPackCommand() returns early for task === 'review', so graphify-ts pack --task review --retrieval-strategy slice-v1 succeeds but has no effect. Either reject that combination or thread it into the review bundle path.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/infrastructure/context-pack-command.ts` around lines 221 - 244,
runContextPackCommand() currently ignores options.retrievalStrategy for task ===
'review' because it returns early; update the review path to either validate and
reject the combination or pass retrievalStrategy into the review bundle
retrieval call: locate runContextPackCommand(), the early-return branch that
handles task === 'review', and the code that builds retrieval options for
dependencies.retrieveContext (and any call sites of retrieveContext or
buildCommunityLabels/compactImpactResult used for review bundles) and ensure
options.retrievalStrategy (and options.retrievalLevel if relevant) are forwarded
into the retrieval options for review processing, or add an explicit validation
that throws a clear error when task === 'review' and retrievalStrategy is set so
callers know the flag is unsupported.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/benchmarks/2026-05-11-spi-vs-legacy/graph-stats.mjs`:
- Line 11: Wrap the JSON.parse call for graph (currently: const graph =
JSON.parse(readFileSync(graphPath, 'utf8'))) in a try/catch so malformed JSON
doesn't crash the CLI: read the file into a string first (using
readFileSync(graphPath, 'utf8')), then try to JSON.parse that string, catch
SyntaxError (or any error), print a clear error via console.error including
graphPath and the error.message, and exit with process.exit(1); ensure the
caught error path does not allow further processing of the invalid graph
variable.

In `@docs/benchmarks/2026-05-11-spi-vs-legacy/probe.mjs`:
- Around line 148-161: The classification logic mislabels non-neutral cases:
change the hurts/expands branch condition in the block that constructs note
(uses tokenDelta, qualityDelta, labelDelta, summary) from (tokenDelta > 0 &&
qualityDelta <= 0) to use OR so it catches either token expansions or quality
regressions; specifically, update the condition guarding
summary.hurts_or_expands.push(note) to (tokenDelta > 0 || qualityDelta < 0) so
quality regressions with token savings and token expansions with quality gains
are correctly classified.

In `@docs/benchmarks/2026-05-11-spi-vs-legacy/run-real-workspace.sh`:
- Around line 8-22: The current run_workspace function and PROMPTS_FILE
assignment can allow typos in GRAPHIFY_BENCH_REAL_PROMPTS,
GRAPHIFY_BENCH_BACKEND, or GRAPHIFY_BENCH_MONOREPO to silently pass through;
update run_workspace (and the PROMPTS_FILE resolution) to validate inputs and
fail fast: check that PROMPTS_FILE exists (-f) and that the workspace_path and
any required backend/monorepo paths exist (-d or -f as appropriate), print a
clear error message naming the missing variable (e.g.,
GRAPHIFY_BENCH_REAL_PROMPTS, GRAPHIFY_BENCH_BACKEND, GRAPHIFY_BENCH_MONOREPO)
and return non-zero/exit so run.sh is not invoked with invalid paths; keep
checks adjacent to the PROMPTS_FILE assignment and at the top of run_workspace
for quick failure.

In `@docs/benchmarks/2026-05-11-spi-vs-legacy/summarize-real-workspaces.mjs`:
- Line 26: The JSON.parse call inside workspaceNames.map can throw on malformed
summary files; wrap the readFileSync/JSON.parse logic used in workspaceNames.map
so each iteration uses a try-catch that captures errors reading/parsing
join(bundleDir, name, 'summary.json') and rethrows or logs a new Error that
includes the workspace name and the file path (e.g., mention name and
'summary.json') so you can identify which workspace failed; update the mapping
to return either the parsed object or propagate the enriched error for later
handling.

In `@src/runtime/context-pack-resolution.ts`:
- Around line 379-385: The side-effect inference currently uses behaviorEdges
(which includes structural labels like 'contains' and 'method') causing false
positives; instead build an execution-only edge set (e.g., execBehaviorEdges =
relationLabels(node, relationIndex, 'outgoing', ['calls']) or another allowlist
of execution-only labels) and pass that execBehaviorEdges into sideEffectHints()
(replace the current sideEffectHints(behaviorEdges) call) so side-effect tags
(queue_event, db_write, etc.) are derived only from executable/calls edges.

In `@src/runtime/retrieve.ts`:
- Around line 1835-1867: The semantic/rerank path currently rebuilds candidates
from all eligible nodes and ignores the lexical slice; update
retrieveContextAsync (and the code paths that call
buildRetrieveResultFromOrderedCandidates) so that when options.retrievalStrategy
=== 'slice-v1' you detect and propagate the lexical slice (lexicalResult.slice
or the sliced.ordered_ids + sliced.metadata returned by
sliceCandidatesForRetrieve) into the semantic/rerank flow and use that
ordered_ids list to constrain the candidate pool before reranking; ensure
sliced.metadata is forwarded into buildRetrieveResultFromOrderedCandidates and
that any fallback to scoredNodeFromGraph only occurs for IDs within the slice so
the slice-v1 constraint and its metadata are honored end-to-end.

In `@src/runtime/retrieve/slicing.ts`:
- Around line 182-185: The traversal code currently drops any neighbor not
already present in scoredCandidates by calling scoredById.get(neighborId) and
continuing when missing; change this so the slice can expand beyond the initial
candidate pool by performing a graph-backed neighbor lookup when scoredById
lacks neighborId (e.g., query the graph index / adjacency map for neighborId,
construct a candidate object for that node, compute its score via the existing
scoring routine, insert it into scoredCandidates/scoredById, and then proceed
with traversal); apply the same change to the other traversal branch that uses
scoredById (the block flagged around the second occurrence) so both paths can
pull in graph-adjacent evidence rather than treating scoredCandidates as a hard
boundary.

In `@src/runtime/stdio/tools.ts`:
- Around line 888-891: The code currently parses retrieval_strategy via
parseRetrievalStrategyParam and returns failure when invalid, but the review
flow (task: 'review') bypasses this check so context_pack can silently accept
unsupported values; update the review branch to validate retrieval_strategy
early: call parseRetrievalStrategyParam(helpers, toolArguments) (or reuse the
existing contextPackStrategy check) inside the review handling path and if it
returns 'invalid' call helpers.failure(id, helpers.jsonrpcInvalidParams,
'retrieval_strategy must be one of default, slice-v1'), or alternatively pass
the validated contextPackStrategy into the review flow so review honors
slice-v1. Ensure you reference parseRetrievalStrategyParam, contextPackStrategy,
helpers.failure and task: 'review' when making the change.

---

Outside diff comments:
In `@src/infrastructure/context-pack-command.ts`:
- Around line 221-244: runContextPackCommand() currently ignores
options.retrievalStrategy for task === 'review' because it returns early; update
the review path to either validate and reject the combination or pass
retrievalStrategy into the review bundle retrieval call: locate
runContextPackCommand(), the early-return branch that handles task === 'review',
and the code that builds retrieval options for dependencies.retrieveContext (and
any call sites of retrieveContext or buildCommunityLabels/compactImpactResult
used for review bundles) and ensure options.retrievalStrategy (and
options.retrievalLevel if relevant) are forwarded into the retrieval options for
review processing, or add an explicit validation that throws a clear error when
task === 'review' and retrievalStrategy is set so callers know the flag is
unsupported.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 70fe1576-27ea-49e9-abf2-0305a581fe5a

📥 Commits

Reviewing files that changed from the base of the PR and between d91ca67 and 965fde8.

📒 Files selected for processing (24)
  • CHANGELOG.md
  • README.md
  • docs/benchmarks/2026-05-11-spi-vs-legacy/README.md
  • docs/benchmarks/2026-05-11-spi-vs-legacy/REAL_WORKSPACE_REPORT_TEMPLATE.md
  • docs/benchmarks/2026-05-11-spi-vs-legacy/graph-stats.mjs
  • docs/benchmarks/2026-05-11-spi-vs-legacy/probe.mjs
  • docs/benchmarks/2026-05-11-spi-vs-legacy/prompts.real-workspace.example.json
  • docs/benchmarks/2026-05-11-spi-vs-legacy/run-real-workspace.sh
  • docs/benchmarks/2026-05-11-spi-vs-legacy/run.sh
  • docs/benchmarks/2026-05-11-spi-vs-legacy/summarize-real-workspaces.mjs
  • src/cli/parser.ts
  • src/contracts/context-pack.ts
  • src/infrastructure/context-pack-command.ts
  • src/runtime/context-pack-resolution.ts
  • src/runtime/retrieve.ts
  • src/runtime/retrieve/slicing.ts
  • src/runtime/stdio/definitions.ts
  • src/runtime/stdio/tools.ts
  • tests/unit/benchmark-graph-stats.test.ts
  • tests/unit/benchmark-real-workspace.test.ts
  • tests/unit/context-pack-resolution-sketch.test.ts
  • tests/unit/retrieve-slice-surface.test.ts
  • tests/unit/retrieve-slice-v1.test.ts
  • tests/unit/stdio-slice-surface.test.ts

Comment thread docs/benchmarks/2026-05-11-spi-vs-legacy/graph-stats.mjs Outdated
Comment thread docs/benchmarks/2026-05-11-spi-vs-legacy/probe.mjs Outdated
Comment thread docs/benchmarks/2026-05-11-spi-vs-legacy/run-real-workspace.sh
Comment thread docs/benchmarks/2026-05-11-spi-vs-legacy/summarize-real-workspaces.mjs Outdated
Comment thread src/runtime/context-pack-resolution.ts Outdated
Comment thread src/runtime/retrieve.ts
Comment thread src/runtime/retrieve/slicing.ts Outdated
Comment thread src/runtime/stdio/tools.ts Outdated
Tighten benchmark script failures, keep slice-v1 constrained through async retrieval, reject unsupported review retrieval_strategy usage, and limit sketch side-effect inference to execution edges.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
tests/unit/benchmark-graph-stats.test.ts (1)

33-43: ⚡ Quick win

Add an explicit non-array edges regression test.

You already cover the “missing edges” fallback. Adding a case where edges exists but is not an array would harden this contract and prevent silent regressions.

Proposed test addition
+  it('falls back to zero edges when graph.json has a non-array edges value', () => {
+    withTempGraph({
+      nodes: [{ id: 'a' }, { id: 'b' }],
+      edges: { source: 'a', target: 'b' },
+    }, (graphPath) => {
+      const output = execFileSync('node', [
+        'docs/benchmarks/2026-05-11-spi-vs-legacy/graph-stats.mjs',
+        graphPath,
+      ], { cwd: process.cwd(), encoding: 'utf8' })
+      expect(JSON.parse(output)).toEqual({ node_count: 2, edge_count: 0 })
+    })
+  })
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/benchmark-graph-stats.test.ts` around lines 33 - 43, Add a new
unit test (next to the existing "falls back to zero edges when graph.json omits
the edges array") that uses withTempGraph to write a graph JSON where edges is
present but not an array (e.g., edges: null or edges: {} or a string) and then
calls the same script
('docs/benchmarks/2026-05-11-spi-vs-legacy/graph-stats.mjs') via execFileSync;
assert JSON.parse(output) equals { node_count: 2, edge_count: 0 } to ensure the
graph-stats.mjs code path that reads edges treats non-array values the same as
missing arrays.
src/runtime/context-pack-resolution.ts (1)

382-383: ⚡ Quick win

Stabilize env/config label ordering for deterministic sketch output.

relationLabels(...) preserves relationship insertion order, so output can drift if upstream edge order varies. Sorting readsEnv and config before rendering will make this deterministic.

Proposed patch
-  const config = relationLabels(node, relationIndex, 'outgoing', ['uses_config'])
-  const readsEnv = relationLabels(node, relationIndex, 'outgoing', ['reads_env'])
+  const config = relationLabels(node, relationIndex, 'outgoing', ['uses_config']).sort((a, b) => a.localeCompare(b))
+  const readsEnv = relationLabels(node, relationIndex, 'outgoing', ['reads_env']).sort((a, b) => a.localeCompare(b))

Also applies to: 402-407

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/runtime/context-pack-resolution.ts` around lines 382 - 383, The output
can vary because relationLabels(node, relationIndex, 'outgoing',
['uses_config']) and relationLabels(node, relationIndex, 'outgoing',
['reads_env']) preserve insertion order; to make sketch output deterministic,
sort the arrays returned by relationLabels (e.g., sort config and readsEnv)
before any rendering/serialization. Locate places where config and readsEnv are
used (the calls to relationLabels and subsequent rendering logic around the
variables named config and readsEnv, plus the similar block around lines
402-407) and replace direct use with a stableSortedConfig and
stableSortedReadsEnv (or sort in-place) so downstream code always consumes a
consistently ordered list.
tests/unit/benchmark-real-workspace.test.ts (1)

135-137: ⚡ Quick win

Replace deprecated toThrowError matcher with toThrow across test files.

Vitest 4 treats toThrowError as a deprecated alias. Replace with toThrow to keep matcher usage current.

Also applies to tests/unit/benchmark-graph-stats.test.ts:53

Suggested diff
tests/unit/benchmark-real-workspace.test.ts:
-      })).toThrowError(expect.objectContaining({
+      })).toThrow(expect.objectContaining({
         stderr: expect.stringContaining('GRAPHIFY_BENCH_REAL_PROMPTS'),
       }))
...
-      })).toThrowError(expect.objectContaining({
+      })).toThrow(expect.objectContaining({
         stderr: expect.stringContaining('GRAPHIFY_BENCH_BACKEND'),
       }))
...
-      })).toThrowError(expect.objectContaining({
+      })).toThrow(expect.objectContaining({
         stderr: expect.stringContaining('monorepo'),
       }))

tests/unit/benchmark-graph-stats.test.ts:
-      ], { cwd: process.cwd(), encoding: 'utf8', stdio: 'pipe' })).toThrowError(
+      ], { cwd: process.cwd(), encoding: 'utf8', stdio: 'pipe' })).toThrow(
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/benchmark-real-workspace.test.ts` around lines 135 - 137, Replace
the deprecated matcher call expect(...).toThrowError(...) with
expect(...).toThrow(...) in the failing tests: locate the occurrences of
expect(...).toThrowError(expect.objectContaining({ stderr:
expect.stringContaining('GRAPHIFY_BENCH_REAL_PROMPTS') })) in the
benchmark-real-workspace test and the similar expect(...).toThrowError(...) in
benchmark-graph-stats.test.ts (the exact invocation uses expect.objectContaining
and expect.stringContaining) and swap toThrowError to toThrow while keeping the
objectContaining argument unchanged so the assertion semantics remain the same.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/runtime/context-pack-resolution.ts`:
- Around line 382-383: The output can vary because relationLabels(node,
relationIndex, 'outgoing', ['uses_config']) and relationLabels(node,
relationIndex, 'outgoing', ['reads_env']) preserve insertion order; to make
sketch output deterministic, sort the arrays returned by relationLabels (e.g.,
sort config and readsEnv) before any rendering/serialization. Locate places
where config and readsEnv are used (the calls to relationLabels and subsequent
rendering logic around the variables named config and readsEnv, plus the similar
block around lines 402-407) and replace direct use with a stableSortedConfig and
stableSortedReadsEnv (or sort in-place) so downstream code always consumes a
consistently ordered list.

In `@tests/unit/benchmark-graph-stats.test.ts`:
- Around line 33-43: Add a new unit test (next to the existing "falls back to
zero edges when graph.json omits the edges array") that uses withTempGraph to
write a graph JSON where edges is present but not an array (e.g., edges: null or
edges: {} or a string) and then calls the same script
('docs/benchmarks/2026-05-11-spi-vs-legacy/graph-stats.mjs') via execFileSync;
assert JSON.parse(output) equals { node_count: 2, edge_count: 0 } to ensure the
graph-stats.mjs code path that reads edges treats non-array values the same as
missing arrays.

In `@tests/unit/benchmark-real-workspace.test.ts`:
- Around line 135-137: Replace the deprecated matcher call
expect(...).toThrowError(...) with expect(...).toThrow(...) in the failing
tests: locate the occurrences of
expect(...).toThrowError(expect.objectContaining({ stderr:
expect.stringContaining('GRAPHIFY_BENCH_REAL_PROMPTS') })) in the
benchmark-real-workspace test and the similar expect(...).toThrowError(...) in
benchmark-graph-stats.test.ts (the exact invocation uses expect.objectContaining
and expect.stringContaining) and swap toThrowError to toThrow while keeping the
objectContaining argument unchanged so the assertion semantics remain the same.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 76a15b07-9bac-422f-b668-9af5e8bf3c51

📥 Commits

Reviewing files that changed from the base of the PR and between 965fde8 and debf9ba.

📒 Files selected for processing (18)
  • docs/benchmarks/2026-05-11-spi-vs-legacy/graph-stats.mjs
  • docs/benchmarks/2026-05-11-spi-vs-legacy/probe.mjs
  • docs/benchmarks/2026-05-11-spi-vs-legacy/run-real-workspace.sh
  • docs/benchmarks/2026-05-11-spi-vs-legacy/summarize-real-workspaces.mjs
  • src/infrastructure/context-pack-command.ts
  • src/runtime/benchmark/probe-calibration.ts
  • src/runtime/context-pack-resolution.ts
  • src/runtime/retrieve.ts
  • src/runtime/retrieve/slicing.ts
  • src/runtime/stdio/tools.ts
  • tests/unit/benchmark-graph-stats.test.ts
  • tests/unit/benchmark-probe-calibration.test.ts
  • tests/unit/benchmark-real-workspace.test.ts
  • tests/unit/context-pack-resolution-sketch.test.ts
  • tests/unit/retrieve-semantic.test.ts
  • tests/unit/retrieve-slice-surface.test.ts
  • tests/unit/retrieve-slice-v1.test.ts
  • tests/unit/stdio-slice-surface.test.ts
✅ Files skipped from review due to trivial changes (2)
  • tests/unit/benchmark-probe-calibration.test.ts
  • tests/unit/retrieve-slice-surface.test.ts
🚧 Files skipped from review as they are similar to previous changes (7)
  • docs/benchmarks/2026-05-11-spi-vs-legacy/graph-stats.mjs
  • docs/benchmarks/2026-05-11-spi-vs-legacy/summarize-real-workspaces.mjs
  • docs/benchmarks/2026-05-11-spi-vs-legacy/probe.mjs
  • tests/unit/retrieve-slice-v1.test.ts
  • docs/benchmarks/2026-05-11-spi-vs-legacy/run-real-workspace.sh
  • src/runtime/retrieve/slicing.ts
  • src/runtime/retrieve.ts

@mohanagy mohanagy merged commit c681f2f into main May 11, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant