Conversation
…king when knowledge_base_url set (#150) Adds result_grouping parameter to search and preflight actions: - 'merged' (default when KB unset): pure BM25 score order (no behavior change) - 'overlay_first' (default when KB set): overlay docs ranked above baseline - 'grouped': separate overlay_hits/baseline_hits arrays in response Changes: - orchestrate.ts: ResultGrouping type, partitionBySource utility, runSearch and runPreflight accept resolvedGrouping, conditional default in dispatcher - index.ts: result_grouping in unified oddkit + oddkit_search + oddkit_preflight tool schemas, threaded through handler args - telemetry.ts: parseToolCall extracts result_grouping, blob9 repurposed - CHANGELOG.md: entry under [Unreleased] - workers/test/result-grouping.test.mjs: 24 tests covering partition logic, conditional defaults, grouped shape, preflight partition, telemetry recording - workers/test/telemetry-integration.test.mjs: updated blob count assertion No version bump (orchestrator decides packaging cadence). No changes to workers/src/bm25.ts. No cache-key change.
Deploying with
|
| Status | Name | Latest Commit | Preview URL | Updated (UTC) |
|---|---|---|---|---|
| ✅ Deployment successful! View logs |
oddkit | 885fcc9 | Commit Preview URL Branch Preview URL |
Apr 28 2026, 08:18 PM |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Autofix Details
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Grouped search NO_MATCH response omits grouped arrays
- Updated the NO_MATCH early return in runSearch to include empty overlay_hits and baseline_hits arrays when resolvedGrouping is "grouped", matching the FOUND path's response shape.
Preview (63eb8e01fa)
diff --git a/CHANGELOG.md b/CHANGELOG.md
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -7,6 +7,10 @@
## [Unreleased]
+### Added
+
+- **`result_grouping` parameter for search and preflight** — when `knowledge_base_url` is set, overlay (knowledge-base) docs are ranked above baseline docs by default (`"overlay_first"`). Callers can explicitly choose `"merged"` (pure BM25 score order, the previous default), `"overlay_first"` (overlay before baseline, preserving score order within each partition), or `"grouped"` (separate `overlay_hits`/`baseline_hits` arrays in search, `start_here_overlay`/`start_here_baseline` in preflight). Conditional default: `knowledge_base_url` unset → `"merged"` (no behavior change); `knowledge_base_url` set → `"overlay_first"`. Telemetry records the caller-specified value in blob9 (`result_grouping`). Closes #150.
+
## [0.26.0] - 2026-04-26
### Added
diff --git a/workers/src/index.ts b/workers/src/index.ts
--- a/workers/src/index.ts
+++ b/workers/src/index.ts
@@ -220,6 +220,7 @@
"canon-tier-2", "canon-tier-1", "published-essay",
]).optional().describe("Optional mode hint. Epistemic modes (exploration/planning/execution) or writing-lifecycle modes (voice-dump/drafting/peer-review-ready/canon-tier-2/canon-tier-1/published-essay). Sourced from odd/challenge/stakes-calibration."),
knowledge_base_url: z.string().optional().describe("Optional GitHub repo URL for your knowledge base. When set, strict mode is automatic: missing files fall through to the bundled governance tier rather than silently substituting from the default knowledge base."),
+ result_grouping: z.enum(["merged", "overlay_first", "grouped"]).optional().describe("For action='search' or 'preflight': controls how overlay (knowledge_base) and baseline results are ordered. 'merged' = pure score order (default when knowledge_base_url unset). 'overlay_first' = overlay docs ranked above baseline (default when knowledge_base_url set). 'grouped' = separate overlay_hits/baseline_hits arrays in response."),
include_metadata: z.boolean().optional().describe("When true, search/get responses include a metadata object with full parsed frontmatter. Default: false."),
section: z.string().optional().describe("For action='get': extract only the named ## section from the document. Returns section content or available sections if not found."),
sort_by: z.enum(["date", "path"]).optional().describe("For action='catalog': sort articles. 'date' returns newest first (requires frontmatter). 'path' returns all docs alphabetically, including undated."),
@@ -241,6 +242,7 @@
context: args.context,
mode: args.mode,
knowledge_base_url: args.knowledge_base_url,
+ result_grouping: args.result_grouping,
include_metadata: args.include_metadata,
section: args.section,
sort_by: args.sort_by,
@@ -321,6 +323,7 @@
schema: {
input: z.string().describe("Natural language query or tags to search for."),
knowledge_base_url: z.string().optional().describe("Optional: GitHub repo URL for your knowledge base. When set, strict mode is automatic: missing files fall through to the bundled governance tier."),
+ result_grouping: z.enum(["merged", "overlay_first", "grouped"]).optional().describe("Controls how overlay (knowledge_base) and baseline results are ordered. 'merged' = pure score order (default when knowledge_base_url unset). 'overlay_first' = overlay docs ranked above baseline (default when knowledge_base_url set). 'grouped' = separate overlay_hits/baseline_hits arrays in response."),
include_metadata: z.boolean().optional().describe("When true, each hit includes a metadata object with full parsed frontmatter. Default: false."),
},
annotations: { readOnlyHint: true, destructiveHint: false, idempotentHint: true, openWorldHint: true },
@@ -386,6 +389,7 @@
schema: {
input: z.string().describe("Description of what you're about to implement."),
knowledge_base_url: z.string().optional().describe("Optional: GitHub repo URL for your knowledge base. When set, strict mode is automatic: missing files fall through to the bundled governance tier."),
+ result_grouping: z.enum(["merged", "overlay_first", "grouped"]).optional().describe("Controls how overlay (knowledge_base) and baseline start_here results are ordered. 'merged' = pure score order (default when knowledge_base_url unset). 'overlay_first' = overlay docs ranked above baseline (default when knowledge_base_url set). 'grouped' = separate start_here_overlay/start_here_baseline arrays."),
},
annotations: { readOnlyHint: true, destructiveHint: false, idempotentHint: true, openWorldHint: true },
},
@@ -432,6 +436,7 @@
context: args.context as string | undefined,
mode: args.mode as string | undefined,
knowledge_base_url: args.knowledge_base_url as string | undefined,
+ result_grouping: args.result_grouping as "merged" | "overlay_first" | "grouped" | undefined,
include_metadata: args.include_metadata as boolean | undefined,
section: args.section as string | undefined,
sort_by: args.sort_by as string | undefined,
diff --git a/workers/src/orchestrate.ts b/workers/src/orchestrate.ts
--- a/workers/src/orchestrate.ts
+++ b/workers/src/orchestrate.ts
@@ -225,12 +225,15 @@
let cachedGatePrerequisitesKnowledgeBaseUrl: string | undefined = undefined;
let cachedGatePrerequisitesSource: "knowledge_base" | "minimal" = "minimal";
+export type ResultGrouping = "merged" | "overlay_first" | "grouped";
+
export interface UnifiedParams {
action: string;
input: string;
context?: string;
mode?: string;
knowledge_base_url?: string;
+ result_grouping?: ResultGrouping;
include_metadata?: boolean;
section?: string;
sort_by?: string;
@@ -1321,6 +1324,20 @@
}
// ──────────────────────────────────────────────────────────────────────────────
+// Result grouping — stable partition by source (overlay=canon vs baseline)
+// Preserves BM25 score order within each partition. Single forward pass — no re-sort.
+// ──────────────────────────────────────────────────────────────────────────────
+
+export function partitionBySource<T extends { source: "canon" | "baseline" }>(
+ arr: T[],
+): { overlay: T[]; baseline: T[] } {
+ const overlay: T[] = [];
+ const baseline: T[] = [];
+ for (const h of arr) (h.source === "canon" ? overlay : baseline).push(h);
+ return { overlay, baseline };
+}
+
+// ──────────────────────────────────────────────────────────────────────────────
// Individual action handlers
// ──────────────────────────────────────────────────────────────────────────────
@@ -1330,6 +1347,7 @@
knowledgeBaseUrl?: string,
state?: OddkitState,
includeMetadata?: boolean,
+ resolvedGrouping: ResultGrouping = "merged",
): Promise<ActionResult> {
const startMs = Date.now();
const index = await fetcher.getIndex(knowledgeBaseUrl);
@@ -1354,13 +1372,18 @@
: undefined;
if (hits.length === 0) {
+ const noMatchResult: Record<string, unknown> = {
+ status: "NO_MATCH",
+ docs_considered: index.entries.length,
+ hits: [],
+ };
+ if (resolvedGrouping === "grouped") {
+ noMatchResult.overlay_hits = [];
+ noMatchResult.baseline_hits = [];
+ }
return {
action: "search",
- result: {
- status: "NO_MATCH",
- docs_considered: index.entries.length,
- hits: [],
- },
+ result: noMatchResult,
state: updatedState,
assistant_text: `Searched ${index.stats.total} documents but found no matches for "${input}". Try rephrasing or ask with action "catalog" to see available documentation.`,
debug: {
@@ -1373,12 +1396,22 @@
};
}
+ // Apply result_grouping partition (overlay_first / grouped re-order;
+ // merged preserves BM25 score order). Single forward pass — no re-sort.
+ let orderedHits = hits;
+ let isGrouped = false;
+ if (resolvedGrouping === "overlay_first" || resolvedGrouping === "grouped") {
+ const { overlay, baseline } = partitionBySource(hits);
+ orderedHits = [...overlay, ...baseline];
+ isGrouped = resolvedGrouping === "grouped";
+ }
+
// Cache for fetched content to avoid redundant fetches when include_metadata is enabled
const contentCache = new Map<string, string>();
- // Fetch excerpts for top results
+ // Fetch excerpts for top results (uses partitioned order)
const evidence: Array<{ quote: string; citation: string; source: string }> = [];
- for (const entry of hits.slice(0, 3)) {
+ for (const entry of orderedHits.slice(0, 3)) {
const content = await fetcher.getFile(entry.path, knowledgeBaseUrl);
if (content) {
contentCache.set(entry.path, content);
@@ -1394,17 +1427,18 @@
}
const assistantLines = [
- `Found ${hits.length} result(s) for: "${input}"`,
+ `Found ${orderedHits.length} result(s) for: "${input}"`,
"",
...evidence.map((e) => `> ${e.quote}\n— ${e.citation} (${e.source})`),
"",
"Results:",
- ...hits.map((r) => `- \`${r.path}\` — ${r.title} (score: ${r.score.toFixed(2)}, ${r.source})`),
+ ...orderedHits.map((r) => `- \`${r.path}\` — ${r.title} (score: ${r.score.toFixed(2)}, ${r.source})`),
];
- // When include_metadata is requested, fetch and parse frontmatter for each hit
+ // When include_metadata is requested, fetch and parse frontmatter for each hit.
+ // Iterates orderedHits so metadata-enriched array preserves the partitioned order.
const hitsWithMetadata: Array<Record<string, unknown>> = [];
- for (const h of hits) {
+ for (const h of orderedHits) {
const hit: Record<string, unknown> = {
uri: h.uri,
path: h.path,
@@ -1425,14 +1459,27 @@
hitsWithMetadata.push(hit);
}
+ // Build result object — add overlay_hits / baseline_hits for "grouped" mode
+ const resultObj: Record<string, unknown> = {
+ status: "FOUND",
+ hits: hitsWithMetadata,
+ evidence,
+ docs_considered: index.entries.length,
+ };
+
+ if (isGrouped) {
+ const overlayHits: Record<string, unknown>[] = [];
+ const baselineHits: Record<string, unknown>[] = [];
+ for (const h of hitsWithMetadata) {
+ (h.source === "canon" ? overlayHits : baselineHits).push(h);
+ }
+ resultObj.overlay_hits = overlayHits;
+ resultObj.baseline_hits = baselineHits;
+ }
+
return {
action: "search",
- result: {
- status: "FOUND",
- hits: hitsWithMetadata,
- evidence,
- docs_considered: index.entries.length,
- },
+ result: resultObj,
state: updatedState,
assistant_text: assistantLines.join("\n").trim(),
debug: {
@@ -2327,22 +2374,32 @@
fetcher: KnowledgeBaseFetcher,
knowledgeBaseUrl?: string,
state?: OddkitState,
+ resolvedGrouping: ResultGrouping = "merged",
): Promise<ActionResult> {
const startMs = Date.now();
const index = await fetcher.getIndex(knowledgeBaseUrl);
const topic = message.replace(/^preflight:\s*/i, "").trim();
- const results = scoreEntries(index.entries, topic).slice(0, 5);
+ // Score all entries, then apply partition before slicing
+ const allScored = scoreEntries(index.entries, topic);
+ let orderedScored = allScored;
+ if (resolvedGrouping === "overlay_first" || resolvedGrouping === "grouped") {
+ const { overlay, baseline } = partitionBySource(allScored);
+ orderedScored = [...overlay, ...baseline];
+ }
+ const results = orderedScored.slice(0, 5);
+
const dodEntry = index.entries.find((e) => e.path.toLowerCase().includes("definition-of-done"));
const constraints = index.entries
.filter((e) => e.path.includes("constraint") || e.authority_band === "governing")
.slice(0, 3);
+ const startHere = results.slice(0, 3);
const assistantText = [
`Preflight: ${topic}`,
``,
`Start here:`,
- ...results.slice(0, 3).map((r) => `- \`${r.path}\` — ${r.title}`),
+ ...startHere.map((r) => `- \`${r.path}\` — ${r.title}`),
``,
`Definition of Done:`,
dodEntry ? `- \`${dodEntry.path}\`` : "- Check canon/definition-of-done.md",
@@ -2358,15 +2415,25 @@
.join("\n")
.trim();
+ // Build result object
+ const resultObj: Record<string, unknown> = {
+ topic,
+ start_here: startHere.map((r) => r.path),
+ dod: dodEntry?.path,
+ constraints: constraints.map((c) => c.path),
+ docs_available: index.stats.total,
+ };
+
+ // For "grouped" mode, split start_here into overlay and baseline arrays (each capped at 3)
+ if (resolvedGrouping === "grouped") {
+ const { overlay, baseline } = partitionBySource(allScored);
+ resultObj.start_here_overlay = overlay.slice(0, 3).map((r) => r.path);
+ resultObj.start_here_baseline = baseline.slice(0, 3).map((r) => r.path);
+ }
+
return {
action: "preflight",
- result: {
- topic,
- start_here: results.slice(0, 3).map((r) => r.path),
- dod: dodEntry?.path,
- constraints: constraints.map((c) => c.path),
- docs_available: index.stats.total,
- },
+ result: resultObj,
state: state ? initState(state) : undefined,
assistant_text: assistantText,
debug: {
@@ -3251,8 +3318,14 @@
] as const;
export async function handleUnifiedAction(params: UnifiedParams): Promise<OddkitEnvelope> {
- const { action, input, context, mode, knowledge_base_url, include_metadata, section, sort_by, limit, offset, filter_epoch, state, env, tracer } = params;
+ const { action, input, context, mode, knowledge_base_url, result_grouping, include_metadata, section, sort_by, limit, offset, filter_epoch, state, env, tracer } = params;
+ // Conditional default: when knowledge_base_url is set and caller didn't
+ // specify result_grouping, default to "overlay_first" (the fix for #150).
+ // When KB is unset, default to "merged" (no behavior change).
+ const resolvedGrouping: ResultGrouping =
+ result_grouping ?? (knowledge_base_url ? "overlay_first" : "merged");
+
if (!VALID_ACTIONS.includes(action as (typeof VALID_ACTIONS)[number])) {
return {
action: "error",
@@ -3283,7 +3356,7 @@
result = await runEncodeAction(input, context, fetcher, knowledge_base_url, state);
break;
case "search":
- result = await runSearch(input, fetcher, knowledge_base_url, state, include_metadata);
+ result = await runSearch(input, fetcher, knowledge_base_url, state, include_metadata, resolvedGrouping);
break;
case "get":
result = await runGet(input, fetcher, knowledge_base_url, state, include_metadata, section);
@@ -3301,7 +3374,7 @@
result = await runValidate(input, state);
break;
case "preflight":
- result = await runPreflight(input, fetcher, knowledge_base_url, state);
+ result = await runPreflight(input, fetcher, knowledge_base_url, state, resolvedGrouping);
break;
case "version":
result = runVersion(env);
@@ -3310,7 +3383,7 @@
result = await runCleanupStorage(fetcher, knowledge_base_url);
break;
default:
- result = await runSearch(input, fetcher, knowledge_base_url, state);
+ result = await runSearch(input, fetcher, knowledge_base_url, state, undefined, resolvedGrouping);
}
// Inject trace into debug envelope (E0008.1)
diff --git a/workers/src/telemetry.ts b/workers/src/telemetry.ts
--- a/workers/src/telemetry.ts
+++ b/workers/src/telemetry.ts
@@ -185,6 +185,7 @@
toolName: string;
documentUri: string;
knowledgeBaseUrl: string;
+ resultGrouping: string;
} | null {
if (typeof payload !== "object" || payload === null || !("method" in payload)) {
return null;
@@ -197,7 +198,7 @@
const params = msg.params;
if (typeof params !== "object" || params === null) {
- return { method, toolName: "", documentUri: "", knowledgeBaseUrl: "" };
+ return { method, toolName: "", documentUri: "", knowledgeBaseUrl: "", resultGrouping: "" };
}
const p = params as Record<string, unknown>;
@@ -206,6 +207,7 @@
// Extract details from tool arguments
let documentUri = "";
let knowledgeBaseUrl = "";
+ let resultGrouping = "";
const args = p.arguments;
if (typeof args === "object" && args !== null) {
const a = args as Record<string, unknown>;
@@ -217,9 +219,13 @@
if (typeof a.knowledge_base_url === "string" && a.knowledge_base_url) {
knowledgeBaseUrl = a.knowledge_base_url;
}
+ // Extract result_grouping from tool arguments (#150)
+ if (typeof a.result_grouping === "string" && a.result_grouping) {
+ resultGrouping = a.result_grouping;
+ }
}
- return { method, toolName, documentUri, knowledgeBaseUrl };
+ return { method, toolName, documentUri, knowledgeBaseUrl, resultGrouping };
}
// ──────────────────────────────────────────────────────────────────────────────
@@ -297,9 +303,10 @@
toolCall?.knowledgeBaseUrl || env.DEFAULT_KNOWLEDGE_BASE_URL || "",
documentUri,
env.ODDKIT_VERSION || BUILD_VERSION,
- // blob9 retired (was cache_tier). Slot stays free per the
- // "no deprecation, nobody uses them yet" rule. Cache effectiveness
- // moved to double7/double8.
+ // blob9: result_grouping (#150). Was retired (cache_tier).
+ // Repurposed for the caller-specified grouping value; empty string
+ // when not applicable (non-search/preflight actions).
+ toolCall?.resultGrouping ?? "",
],
doubles: [
1, // double1: count
@@ -354,9 +361,7 @@
"knowledge_base_url", // blob6
"document_uri", // blob7
"worker_version", // blob8
- // blob9 retired (was cache_tier). Slot stays free per the
- // "no deprecation, nobody uses them yet" rule. Hit-rate moved to
- // double7/double8.
+ "result_grouping", // blob9 — repurposed from retired cache_tier (#150)
] as const;
/**
diff --git a/workers/test/result-grouping.test.mjs b/workers/test/result-grouping.test.mjs
new file mode 100644
--- /dev/null
+++ b/workers/test/result-grouping.test.mjs
@@ -1,0 +1,533 @@
+#!/usr/bin/env node
+/**
+ * Unit + integration tests for the result_grouping feature (#150).
+ *
+ * Tests:
+ * - partitionBySource: stable partition, edge cases, ordering guarantees
+ * - Conditional default: KB set → overlay_first, KB unset → merged
+ * - Grouped shape construction: overlay_hits / baseline_hits arrays
+ * - Preflight partition: start_here_overlay / start_here_baseline
+ * - Telemetry: blob9 carries result_grouping value
+ *
+ * Compiles orchestrate.ts + telemetry.ts via tsc into a temp dir, then
+ * dynamic-imports the compiled .js. Same pattern as tokenize.test.mjs
+ * and telemetry-integration.test.mjs.
+ */
+
+import assert from "node:assert/strict";
+import { spawnSync } from "node:child_process";
+import { mkdtempSync, writeFileSync, symlinkSync, existsSync, readdirSync, readFileSync } from "node:fs";
+import { tmpdir } from "node:os";
+import { join, dirname } from "node:path";
+import { fileURLToPath } from "node:url";
+
+const __dirname = dirname(fileURLToPath(import.meta.url));
+const WORKERS_ROOT = join(__dirname, "..");
+
+// ─── Compile orchestrate.ts + telemetry.ts to temp dir ────────────────────
+
+const tmp = mkdtempSync(join(tmpdir(), "oddkit-result-grouping-test-"));
+const tsconfig = {
+ compilerOptions: {
+ target: "ES2022",
+ module: "ES2022",
+ moduleResolution: "bundler",
+ lib: ["ES2022", "DOM"],
+ types: ["@cloudflare/workers-types"],
+ noEmitOnError: false,
+ strict: false,
+ skipLibCheck: true,
+ resolveJsonModule: true,
+ allowSyntheticDefaultImports: true,
+ esModuleInterop: true,
+ rootDir: join(WORKERS_ROOT, "src"),
+ outDir: join(tmp, "build"),
+ },
+ include: [
+ join(WORKERS_ROOT, "src", "orchestrate.ts"),
+ join(WORKERS_ROOT, "src", "telemetry.ts"),
+ join(WORKERS_ROOT, "src", "tracing.ts"),
+ join(WORKERS_ROOT, "src", "zip-baseline-fetcher.ts"),
+ join(WORKERS_ROOT, "src", "bm25.ts"),
+ join(WORKERS_ROOT, "src", "markdown-utils.ts"),
+ ],
+};
+const tsconfigPath = join(tmp, "tsconfig.json");
+writeFileSync(tsconfigPath, JSON.stringify(tsconfig, null, 2));
+
+const tmpNodeModules = join(tmp, "node_modules");
+if (!existsSync(tmpNodeModules)) {
+ symlinkSync(join(WORKERS_ROOT, "node_modules"), tmpNodeModules);
+}
+// orchestrate.ts imports ../package.json
+if (!existsSync(join(tmp, "package.json"))) {
+ symlinkSync(join(WORKERS_ROOT, "package.json"), join(tmp, "package.json"));
+}
+
+const compile = spawnSync("npx", ["--yes", "tsc", "-p", tsconfigPath], {
+ encoding: "utf8",
+});
+
+// With noEmitOnError: false, tsc may exit non-zero on type errors in the dep
+// graph (zip-baseline-fetcher.ts has workers-types friction) while still
+// producing the .js files we need. Only bail if target files weren't emitted.
+const buildDir = join(tmp, "build");
+const orchestrateJs = join(buildDir, "orchestrate.js");
+const telemetryJs = join(buildDir, "telemetry.js");
+if (!existsSync(orchestrateJs) || !existsSync(telemetryJs)) {
+ console.error("TypeScript compile failed (target files not emitted):");
+ console.error(compile.stdout);
+ console.error(compile.stderr);
+ process.exit(1);
+}
+if (compile.status !== 0 && process.env.DEBUG) {
+ console.error("Note: tsc reported errors but target .js files were emitted:");
+ console.error(compile.stdout);
+}
+
+// Patch compiled files: JSON import assertions + extensionless local imports
+for (const f of readdirSync(buildDir).filter((n) => n.endsWith(".js"))) {
+ const fpath = join(buildDir, f);
+ let src = readFileSync(fpath, "utf8");
+ src = src.replace(
+ /from ["']\.\.\/package\.json["'];/g,
+ 'from "../package.json" with { type: "json" };',
+ );
+ src = src.replace(
+ /from ["'](\.\/[^"'.]+)["'];/g,
+ 'from "$1.js";',
+ );
+ writeFileSync(fpath, src);
+}
+
+// Import the compiled module
+const { partitionBySource } = await import(orchestrateJs);
+const { recordTelemetry, parseToolCall } = await import(telemetryJs);
+
+// Also import tokenize for telemetry shape tests
+const tokenizeJs = join(buildDir, "tokenize.js");
+let measurePayloadShape = null;
+if (existsSync(tokenizeJs)) {
+ const tok = await import(tokenizeJs);
+ measurePayloadShape = tok.measurePayloadShape;
+}
+
+// ─── Test harness ─────────────────────────────────────────────────────────
+
+let pass = 0;
+let fail = 0;
+
+async function test(name, fn) {
+ try {
+ await fn();
+ console.log(` ✓ ${name}`);
+ pass++;
+ } catch (err) {
+ console.log(` ✗ ${name}`);
+ console.log(` ${err.message}`);
+ if (err.stack && process.env.DEBUG) console.log(err.stack);
+ fail++;
+ }
+}
+
+console.log("result-grouping tests (#150)\n");
+
+// ─── Test fixtures ────────────────────────────────────────────────────────
+
+// Fixture: mixed entries with interleaving scores
+// canon entries have scores 10, 6 (ranked 1st, 3rd in BM25 order)
+// baseline entries have scores 8, 4 (ranked 2nd, 4th in BM25 order)
+// This ensures partition actually reorders — a fixture where canon always
+// outscores baseline would prove nothing.
+const mixedHits = [
+ { path: "canon/a.md", title: "Canon A", source: "canon", score: 10 },
+ { path: "docs/b.md", title: "Baseline B", source: "baseline", score: 8 },
+ { path: "canon/c.md", title: "Canon C", source: "canon", score: 6 },
+ { path: "docs/d.md", title: "Baseline D", source: "baseline", score: 4 },
+];
+
+const canonOnly = [
+ { path: "canon/x.md", title: "Canon X", source: "canon", score: 10 },
+ { path: "canon/y.md", title: "Canon Y", source: "canon", score: 5 },
+];
+
+const baselineOnly = [
+ { path: "docs/m.md", title: "Baseline M", source: "baseline", score: 9 },
+ { path: "docs/n.md", title: "Baseline N", source: "baseline", score: 3 },
+];
+
+// ─── partitionBySource tests ──────────────────────────────────────────────
+
+console.log("partitionBySource:");
+
+await test("splits mixed entries into overlay (canon) and baseline", () => {
+ const { overlay, baseline } = partitionBySource(mixedHits);
+ assert.equal(overlay.length, 2, "should have 2 overlay entries");
+ assert.equal(baseline.length, 2, "should have 2 baseline entries");
+ assert.ok(overlay.every((h) => h.source === "canon"), "all overlay should be canon");
+ assert.ok(baseline.every((h) => h.source === "baseline"), "all baseline should be baseline");
+});
+
+await test("preserves BM25 score order within each partition (stability)", () => {
+ const { overlay, baseline } = partitionBySource(mixedHits);
+ // overlay: canon/a (10) then canon/c (6)
+ assert.equal(overlay[0].path, "canon/a.md");
+ assert.equal(overlay[1].path, "canon/c.md");
+ assert.ok(overlay[0].score >= overlay[1].score, "overlay should be descending score");
+ // baseline: docs/b (8) then docs/d (4)
+ assert.equal(baseline[0].path, "docs/b.md");
+ assert.equal(baseline[1].path, "docs/d.md");
+ assert.ok(baseline[0].score >= baseline[1].score, "baseline should be descending score");
+});
+
+await test("overlay_first reorder: all canon before all baseline", () => {
+ const { overlay, baseline } = partitionBySource(mixedHits);
+ const ordered = [...overlay, ...baseline];
+ // Expected: canon/a(10), canon/c(6), docs/b(8), docs/d(4)
+ assert.equal(ordered[0].source, "canon");
+ assert.equal(ordered[1].source, "canon");
+ assert.equal(ordered[2].source, "baseline");
+ assert.equal(ordered[3].source, "baseline");
+ // Scores within tiers are descending
+ assert.ok(ordered[0].score >= ordered[1].score);
+ assert.ok(ordered[2].score >= ordered[3].score);
+});
+
+await test("canon-only input: overlay = all, baseline = empty", () => {
+ const { overlay, baseline } = partitionBySource(canonOnly);
+ assert.equal(overlay.length, 2);
+ assert.equal(baseline.length, 0);
+});
+
+await test("baseline-only input: overlay = empty, baseline = all", () => {
+ const { overlay, baseline } = partitionBySource(baselineOnly);
+ assert.equal(overlay.length, 0);
+ assert.equal(baseline.length, 2);
+});
+
+await test("empty array: both partitions empty", () => {
+ const { overlay, baseline } = partitionBySource([]);
+ assert.equal(overlay.length, 0);
+ assert.equal(baseline.length, 0);
+});
+
+await test("stability: entries with identical scores retain pre-partition relative order", () => {
+ const sameScore = [
+ { path: "canon/first.md", source: "canon", score: 5 },
+ { path: "docs/between.md", source: "baseline", score: 5 },
+ { path: "canon/second.md", source: "canon", score: 5 },
+ { path: "docs/last.md", source: "baseline", score: 5 },
+ ];
+ const { overlay, baseline } = partitionBySource(sameScore);
+ // Within canon: first then second (insertion order preserved)
+ assert.equal(overlay[0].path, "canon/first.md");
+ assert.equal(overlay[1].path, "canon/second.md");
+ // Within baseline: between then last (insertion order preserved)
+ assert.equal(baseline[0].path, "docs/between.md");
+ assert.equal(baseline[1].path, "docs/last.md");
+});
+
+// ─── Conditional default logic tests ──────────────────────────────────────
+
+console.log("\nconditional default:");
+
+await test("KB unset → default is merged", () => {
+ const knowledge_base_url = undefined;
+ const result_grouping = undefined;
+ const resolved = result_grouping ?? (knowledge_base_url ? "overlay_first" : "merged");
+ assert.equal(resolved, "merged");
+});
+
+await test("KB set → default is overlay_first", () => {
+ const knowledge_base_url = "https://github.com/klappy/klappy.dev";
+ const result_grouping = undefined;
+ const resolved = result_grouping ?? (knowledge_base_url ? "overlay_first" : "merged");
+ assert.equal(resolved, "overlay_first");
+});
+
+await test("explicit merged overrides KB-set default", () => {
+ const knowledge_base_url = "https://github.com/klappy/klappy.dev";
+ const result_grouping = "merged";
+ const resolved = result_grouping ?? (knowledge_base_url ? "overlay_first" : "merged");
+ assert.equal(resolved, "merged");
+});
+
+await test("explicit overlay_first works with KB unset", () => {
+ const knowledge_base_url = undefined;
+ const result_grouping = "overlay_first";
+ const resolved = result_grouping ?? (knowledge_base_url ? "overlay_first" : "merged");
+ assert.equal(resolved, "overlay_first");
+});
+
+await test("explicit grouped works regardless of KB", () => {
+ for (const kb of [undefined, "https://github.com/klappy/klappy.dev"]) {
+ const result_grouping = "grouped";
+ const resolved = result_grouping ?? (kb ? "overlay_first" : "merged");
+ assert.equal(resolved, "grouped", `should be grouped when kb=${kb}`);
+ }
+});
+
+// ─── Grouped shape construction tests ─────────────────────────────────────
+
+console.log("\ngrouped shape construction:");
+
+await test("grouped search: overlay_hits and baseline_hits arrays present and correct", () => {
+ // Simulate the grouped shape construction from runSearch
+ const orderedHits = (() => {
+ const { overlay, baseline } = partitionBySource(mixedHits);
+ return [...overlay, ...baseline];
+ })();
+
+ // Simulate metadata enrichment (adds uri field)
+ const hitsWithMetadata = orderedHits.map((h) => ({
+ uri: `klappy://${h.path.replace(".md", "")}`,
+ path: h.path,
+ title: h.title,
+ score: h.score,
+ source: h.source,
+ }));
+
+ // Build grouped shape
+ const overlayHits = [];
+ const baselineHits = [];
+ for (const h of hitsWithMetadata) {
+ (h.source === "canon" ? overlayHits : baselineHits).push(h);
+ }
+
+ // Assertions
+ assert.equal(overlayHits.length, 2, "overlay_hits should have 2 items");
+ assert.equal(baselineHits.length, 2, "baseline_hits should have 2 items");
+ assert.ok(overlayHits.every((h) => h.source === "canon"));
+ assert.ok(baselineHits.every((h) => h.source === "baseline"));
+
+ // hits (back-compat) is overlay-then-baseline
+ assert.equal(hitsWithMetadata[0].source, "canon");
+ assert.equal(hitsWithMetadata[1].source, "canon");
+ assert.equal(hitsWithMetadata[2].source, "baseline");
+ assert.equal(hitsWithMetadata[3].source, "baseline");
+});
+
+await test("grouped with empty overlay: overlay_hits=[], baseline_hits=[...]", () => {
+ const { overlay, baseline } = partitionBySource(baselineOnly);
+ assert.equal(overlay.length, 0);
+ assert.equal(baseline.length, 2);
+
+ const orderedHits = [...overlay, ...baseline];
+ assert.equal(orderedHits.length, 2);
+ assert.ok(orderedHits.every((h) => h.source === "baseline"));
+});
+
+await test("grouped with empty baseline: overlay_hits=[...], baseline_hits=[]", () => {
+ const { overlay, baseline } = partitionBySource(canonOnly);
+ assert.equal(overlay.length, 2);
+ assert.equal(baseline.length, 0);
+
+ const orderedHits = [...overlay, ...baseline];
+ assert.equal(orderedHits.length, 2);
+ assert.ok(orderedHits.every((h) => h.source === "canon"));
+});
+
+// ─── Preflight partition tests ────────────────────────────────────────────
+
+console.log("\npreflight partition:");
+
+await test("preflight overlay_first: partition applied before slice", () => {
+ // Simulate scoreEntries output with interleaving scores
+ const allScored = [
+ { path: "docs/high.md", source: "baseline", score: 20 },
+ { path: "canon/mid-high.md", source: "canon", score: 18 },
+ { path: "docs/mid.md", source: "baseline", score: 15 },
+ { path: "canon/mid-low.md", source: "canon", score: 12 },
+ { path: "docs/low.md", source: "baseline", score: 8 },
+ { path: "canon/lowest.md", source: "canon", score: 3 },
+ ];
+
+ // overlay_first: partition then slice(0, 5)
+ const { overlay, baseline } = partitionBySource(allScored);
+ const ordered = [...overlay, ...baseline];
+ const results = ordered.slice(0, 5);
+ const startHere = results.slice(0, 3).map((r) => r.path);
+
+ // First 3 results should be all canon (3 canon entries exist)
+ assert.equal(startHere[0], "canon/mid-high.md");
+ assert.equal(startHere[1], "canon/mid-low.md");
+ assert.equal(startHere[2], "canon/lowest.md");
+});
+
+await test("preflight grouped: start_here_overlay and start_here_baseline", () => {
+ const allScored = [
+ { path: "docs/high.md", source: "baseline", score: 20 },
+ { path: "canon/mid-high.md", source: "canon", score: 18 },
+ { path: "docs/mid.md", source: "baseline", score: 15 },
+ { path: "canon/mid-low.md", source: "canon", score: 12 },
+ ];
+
+ const { overlay, baseline } = partitionBySource(allScored);
+ const startHereOverlay = overlay.slice(0, 3).map((r) => r.path);
+ const startHereBaseline = baseline.slice(0, 3).map((r) => r.path);
+
+ assert.deepEqual(startHereOverlay, ["canon/mid-high.md", "canon/mid-low.md"]);
+ assert.deepEqual(startHereBaseline, ["docs/high.md", "docs/mid.md"]);
+});
+
+await test("preflight merged: no partition applied (pure score order)", () => {
+ const allScored = [
+ { path: "docs/high.md", source: "baseline", score: 20 },
+ { path: "canon/mid-high.md", source: "canon", score: 18 },
+ { path: "docs/mid.md", source: "baseline", score: 15 },
+ ];
+
+ // merged = just use allScored directly
+ const startHere = allScored.slice(0, 3).map((r) => r.path);
+ assert.deepEqual(startHere, ["docs/high.md", "canon/mid-high.md", "docs/mid.md"]);
+});
+
+// ─── Telemetry: parseToolCall extracts result_grouping ────────────────────
+
+console.log("\ntelemetry:");
+
+await test("parseToolCall extracts result_grouping from oddkit_search arguments", () => {
+ const payload = {
+ jsonrpc: "2.0",
+ id: 1,
+ method: "tools/call",
+ params: {
+ name: "oddkit_search",
+ arguments: {
+ input: "test query",
+ knowledge_base_url: "https://github.com/klappy/klappy.dev",
+ result_grouping: "overlay_first",
+ },
+ },
... diff truncated: showing 800 of 956 linesYou can send follow-ups to the cloud agent here.
…CLI (#150 fix-forward) - Worker (workers/src/orchestrate.ts): runSearch now pulls 50 BM25 candidates when resolvedGrouping !== "merged", partitions, then slices to FINAL_LIMIT (5). The original implementation truncated to 5 before partition, making overlay docs at BM25 position 6+ invisible to the partition logic. - CLI (src/core/actions.js): mirrored — partitionByOrigin helper, conditional default on result_grouping (KB set → overlay_first, else merged), candidate- pool widening, overlay_hits/baseline_hits arrays for grouped mode, NO_MATCH branch includes empty grouped arrays when applicable. - Tests: 2 new regression cases for partitionBySource against widened pools. - CHANGELOG: separate Fixed entry for the candidate-pool widening, plus a CLI parity Added entry. All worker tests (5 files, 26 tests in result-grouping) and CLI smoke tests pass; tsc --noEmit clean.
|
Fix-forward landed in
Cursor Bugbot finding ( Tests: 26/26 in Awaiting Bugbot re-review on the new commit, then a fresh-context Sonnet 4.6 validator dispatch (canon: |
- runSearch: compute updatedState/canon_refs from orderedHits (the truncated returned set), not the wider 50-candidate BM25 pool. Mirrors src/core/actions.js ordering: partition+truncate first, then derive state. - runPreflight: build start_here_overlay / start_here_baseline by partitioning startHere (top 3 of results) instead of the full allScored list, so grouped arrays are subsets of start_here.
The CLI preflight case in handleAction delegated to runOrchestrate without forwarding resolvedGrouping, so runPreflight always received the default merged ordering. This broke parity with the worker, which already threads resolvedGrouping into runPreflight. - src/core/actions.js: pass resolvedGrouping into runOrchestrate for the preflight case. - src/mcp/orchestrate.js: accept result_grouping option and forward it to runPreflight. - src/tasks/preflight.js: partition start_here by origin for overlay_first/grouped, and emit start_here_overlay / start_here_baseline for grouped, mirroring workers/src/orchestrate.ts.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Reviewed by Cursor Bugbot for commit 3c8f77f. Configure here.
…F-1) The Sonnet 4.6 read-only validator dispatch flagged that runPreflight had 'result_grouping ?? "merged"' which is dead code on every public path (src/core/actions.js pre-resolves before delegating) but would silently return "merged" for any direct importer with baseline set and no explicit result_grouping. Mirror the worker's conditional default for parity. Validator: dispatched per klappy://canon/constraints/release-validation-gate Rule 2; verdict PASS with this finding. Validator report archived in the session ledger.
|
Validator dispatch complete — Sonnet 4.6, fresh context, read-only. Per Verdict: PASS. SHA Three findings:
Per Rule 1, merge waits for Bugbot to complete. Bugbot already ran on |
…pes (Bugbot) Cursor Bugbot finding (Low): worker debug envelopes for runSearch and runPreflight omitted result_grouping while CLI counterparts in src/core/actions.js and src/tasks/preflight.js included it. Closes the parity gap so debug observability is consistent across worker and CLI. The other six Bugbot findings on this PR were point-in-time reviews already addressed by intermediate commits (60fe9b8, ed3dd14, 8cb0b32, 3c8f77f, 96ef432) — verified by inspecting HEAD bytes against each finding's locator.
|
Bugbot disposition map — checked all 7 inline findings against HEAD bytes (
Six of seven were point-in-time review echoes from earlier commits — Cursor Bugbot reviews each push, and findings on a stale commit don't auto-close when the issue's fixed in a later commit. Only #7 was actually live at HEAD; now closed.
Awaiting Bugbot re-review on |
Documents the result_grouping parameter shipped in klappy/oddkit#152 (closes klappy/oddkit#150). - docs/oddkit/tools/oddkit_search.md: added knowledge_base_url and result_grouping parameters in input schema; expanded response shape with source: "canon" | "baseline" and conditional overlay_hits/baseline_hits; new Result Grouping section explaining the three values and conditional default. - docs/oddkit/tools/oddkit_preflight.md: parallel treatment scoped to preflight; expanded response shape with start_here_overlay/start_here_baseline; canon ref to klappy://canon/principles/scoped-truth.

Summary
Implements Option D1+D2 hybrid from #150: adds a
result_groupingparameter to search and preflight actions that controls how overlay (knowledge-base) and baseline results are ordered.Conditional default
knowledge_base_urlunset → default"merged"(pure BM25 score order, no behavior change)knowledge_base_urlset → default"overlay_first"(overlay docs ranked above baseline — the fix for Feature Request: Isolate or re-rank knowledge base content vs baseline in search corpus #150)Values
"merged"— existing behavior unchanged (BM25 score order)"overlay_first"— stable partition: allsource==="canon"hits precede allsource==="baseline"hits; BM25 score order preserved within each partition"grouped"— response carriesoverlay_hits/baseline_hitsarrays (search) orstart_here_overlay/start_here_baseline(preflight); mergedhits/start_herestill emitted for back-compatChanges
workers/src/orchestrate.tsResultGroupingtype,partitionBySourceutility (exported for testing),runSearchandrunPreflightacceptresolvedGrouping, conditional default computed inhandleUnifiedActiondispatcherworkers/src/index.tsresult_groupingschema added to unifiedoddkittool,oddkit_search, andoddkit_preflight; threaded through handler argsworkers/src/telemetry.tsparseToolCallextractsresult_groupingfrom tool arguments; blob9 repurposed from retiredcache_tierslotworkers/test/result-grouping.test.mjsworkers/test/telemetry-integration.test.mjsCHANGELOG.md[Unreleased]Test results
Notes
workers/src/bm25.ts. Re-ranking is post-BM25 in orchestrate.ts.partitionBySourceuses a single forward pass that pushes tooverlayorbaselinearrays — never calls.sort()again.orderedHits(the partitioned order) sooverlay_hits/baseline_hitsin grouped mode contain the metadata-enriched objects.docs/oddkit/tools/oddkit_search.md,oddkit_preflight.md) will be updated in a separate PR after this one merges.Closes #150
Note
Medium Risk
Changes search and preflight result ordering and response shape (including a new grouped mode) across both the Cloudflare Worker and Node CLI, plus updates telemetry schema (blob9 repurposed), so regressions could affect ranking, state-threading, or downstream consumers expecting the old shape.
Overview
Adds a
result_groupingparameter to search and preflight to control overlay vs baseline ordering, with a conditional default ofoverlay_firstwhenknowledge_base_url/baseline override is set andmergedotherwise.Updates both Worker and Node CLI implementations to (a) stable-partition results without re-sorting, (b) support
groupedresponses exposingoverlay_hits/baseline_hits(search) andstart_here_overlay/start_here_baseline(preflight), and (c) widen the BM25 candidate pool to 50 when grouping is enabled before truncating to the final 5.Extends observability by threading
result_groupingthrough tool schemas and debug envelopes, and repurposes telemetryblob9to record the caller-specified grouping value; adds/updates regression tests covering partitioning, defaults, candidate widening, and telemetry blob shape.Reviewed by Cursor Bugbot for commit 885fcc9. Bugbot is set up for automated code reviews on this repo. Configure here.
Fix-forward (post-Cursor-comparison) — 2026-04-28T17:30Z
After comparing this PR against #151 (Cursor's parallel implementation), two real gaps were closed in commit
60fe9b8:Gap 1 — Candidate-pool widening (worker + CLI)
runSearchpreviously calledsearchBM25(bm25, input, 5), truncating to 5 candidates before the partition. Any overlay doc ranked at BM25 position 6+ would be invisible to the partition. PR #151 caught this by widening to 50 candidates. This fix-forward applies the same approach: whenresolvedGrouping !== "merged", pull 50 candidates, partition, then.slice(0, FINAL_LIMIT)(5).Gap 2 — Node CLI mirror (
src/core/actions.js)The original PR only changed
workers/src/. PR #151 also updated the Node CLI'ssearchaction so the CLI behaves consistently. This fix-forward mirrors the worker logic tosrc/core/actions.js:partitionByOriginhelper (CLI usesorigin: "local" | "baseline"where the worker usessource: "canon" | "baseline").baselineset →overlay_first, elsemerged).overlay_hits/baseline_hitsarrays for grouped mode.63eb8e0).result_groupingfield in the CLI's debug envelope.Bugbot disposition
Cursor Bugbot left one Low-severity finding on commit
20b5e41: "Whenresult_grouping=groupedand no hits, NO_MATCH branch omitsoverlay_hits/baseline_hits." This was already fixed in commit63eb8e0(worker side) before Bugbot's review settled. The CLI now matches this behavior in commit60fe9b8. Disposition: already-fixed.Tests
workers/test/result-grouping.test.mjsvalidate the candidate-pool widening:partition surfaces overlay even when overlay is mostly low-scoreandwidened pool: 50 candidates partition correctly without losing overlay.result-grouping.test.mjspass; full worker test suite (5 files) green;tsc --noEmitclean; CLI smoke (bash tests/smoke.sh) green.Independent validator
A Sonnet 4.6 read-only validator dispatch is queued. The orchestrator will run the canon-mandated independent corroboration (per
klappy://canon/constraints/release-validation-gateRule 2) once API credits are available; the validator brief is in the original implementation plan §8.Validator dispatch — 2026-04-28T19:55Z
Per
klappy://canon/constraints/release-validation-gateRule 2 (load-bearing changes toorchestrate.tsrequire an independent validator), an isolated Sonnet 4.6 read-only validator was dispatched against SHA3c8f77f3with no access to the implementation plan.Verdict: PASS. Live preview testing at
feat-result-grouping-overlay-first-oddkit.klappy.workers.devconfirmed all 5 scenarios. The decisive evidence: withknowledge_base_urlset and no explicitresult_grouping, a baseline doc scoring 7.02 ranks below an overlay doc scoring 4.21 — the conditional-defaultoverlay_firstpartition is live. Three consecutive identical curls confirmed zero flakiness.Validator findings
96ef432—runPreflightnow applies the conditional default directly instead of?? "merged", so direct importers (not justsrc/core/actions.js-mediated callers) get the correct behavior.src/tasks/preflight.js. The full file list as of HEAD is in the validator preview test results — see actual PR diff for ground truth.60fe9b8are documented in the fix-forward section above.Final file list (HEAD)
CHANGELOG.mdsrc/core/actions.jssrc/mcp/orchestrate.js8cb0b32a)src/tasks/preflight.js8cb0b32a/3c8f77f3+ validator F-1 fix96ef432)workers/src/index.tsworkers/src/orchestrate.tsworkers/src/telemetry.tsworkers/test/result-grouping.test.mjsworkers/test/telemetry-integration.test.mjsFinal test counts
result-grouping.test.mjs: 26 / 26 pass (24 original + 2 regression cases for candidate-pool widening)tsc --noEmitcleanbash tests/smoke.sh)success(Workers Builds, Test CF Preview, Version Sync, Creed Freshness, Cursor Bugbotneutral)