feat(kb-scope): default-scope search corpus to overlay + required-baseline (E0008.5)#153
feat(kb-scope): default-scope search corpus to overlay + required-baseline (E0008.5)#153
Conversation
…eline (E0008.5) When knowledge_base_url is set, the search corpus now defaults to overlay + required-baseline-only (six files from canon's core-governance-baseline §'Required in Baseline'). Callers opt in to the legacy merged corpus via include_full_baseline: true. When knowledge_base_url is unset, the parameter is a no-op and behavior is unchanged. Affected actions: search, catalog, preflight. orient, get, challenge, validate, gate, encode, audit are unchanged — they read governance via the per-file resolver, not the search index. Implementation: - workers/baseline/MANIFEST.json: required-baseline manifest (six paths, Build-Time Invariant #4). - workers/src/zip-baseline-fetcher.ts: - new SearchScope type - exported REQUIRED_BASELINE_PATHS Set (synced with MANIFEST.json) - getIndex(knowledgeBaseUrl, scope?) filters baseline entries - cache key includes effective scope so scoped/merged caches do not poison each other - BaselineIndex.search_scope and stats.baseline_indexed surfaced - workers/src/orchestrate.ts: - UnifiedParams.include_full_baseline added - resolvedScope derived at dispatch - threaded into runSearch/runCatalog/runPreflight - search/catalog/preflight debug envelopes emit search_scope, overlay_doc_count, baseline_doc_count - catalog assistant_text and result reflect the scoped count and disclose baseline_total - workers/src/index.ts: include_full_baseline added to the unified oddkit schema and to oddkit_search/oddkit_catalog/oddkit_preflight. - workers/test/canon-tool-envelope.smoke.mjs: live-smoke assertions for the new envelope fields, scoped default behavior, opt-in to merged, leak prevention against ptxprint-mcp KB, and no-KB no-op. - Version bump 0.26.0 -> 0.27.0 per governance-change-discipline.md. Authority: klappy://canon/constraints/core-governance-baseline §'Search-Corpus Boundary' (klappy/klappy.dev #155, merged 2026-04-29).
Deploying with
|
| Status | Name | Latest Commit | Preview URL | Updated (UTC) |
|---|---|---|---|---|
| ✅ Deployment successful! View logs |
oddkit | 8e88a9f | Commit Preview URL Branch Preview URL |
Apr 29 2026, 02:07 PM |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Preflight tool schema missing
include_full_baselineparameter- Added the optional include_full_baseline boolean to the oddkit_preflight schema so callers can opt back into the legacy merged corpus, matching oddkit_search and oddkit_catalog.
Preview (8e88a9f837)
diff --git a/package.json b/package.json
--- a/package.json
+++ b/package.json
@@ -1,6 +1,6 @@
{
"name": "oddkit",
- "version": "0.26.0",
+ "version": "0.27.0",
"description": "Agent-first CLI for ODD-governed repos. Epistemic terrain rendering with portable baseline.",
"type": "module",
"bin": {
diff --git a/workers/baseline/MANIFEST.json b/workers/baseline/MANIFEST.json
new file mode 100644
--- /dev/null
+++ b/workers/baseline/MANIFEST.json
@@ -1,0 +1,15 @@
+{
+ "$schema": "https://klappy.dev/canon/constraints/core-governance-baseline",
+ "comment": "Required-baseline manifest. The six files every knowledge-base-driven oddkit tool needs to function. Canon source: klappy://canon/constraints/core-governance-baseline §'Required in Baseline'. When knowledge_base_url is set and include_full_baseline is unset/false, the search corpus indexes the project KB plus only these files from the default baseline.",
+ "version": 1,
+ "epoch": "E0008.5",
+ "canon_uri": "klappy://canon/constraints/core-governance-baseline",
+ "required_paths": [
+ "canon/values/orientation.md",
+ "canon/values/axioms.md",
+ "canon/meta/writing-canon.md",
+ "canon/constraints/definition-of-done.md",
+ "canon/constraints/telemetry-governance.md",
+ "odd/challenge/stakes-calibration.md"
+ ]
+}
diff --git a/workers/package.json b/workers/package.json
--- a/workers/package.json
+++ b/workers/package.json
@@ -1,6 +1,6 @@
{
"name": "oddkit-mcp-worker",
- "version": "0.26.0",
+ "version": "0.27.0",
"private": true,
"type": "module",
"scripts": {
diff --git a/workers/src/index.ts b/workers/src/index.ts
--- a/workers/src/index.ts
+++ b/workers/src/index.ts
@@ -222,6 +222,7 @@
knowledge_base_url: z.string().optional().describe("Optional GitHub repo URL for your knowledge base. When set, strict mode is automatic: missing files fall through to the bundled governance tier rather than silently substituting from the default knowledge base."),
result_grouping: z.enum(["merged", "overlay_first", "grouped"]).optional().describe("For action='search' or 'preflight': controls how overlay (knowledge_base) and baseline results are ordered. 'merged' = pure score order (default when knowledge_base_url unset). 'overlay_first' = overlay docs ranked above baseline (default when knowledge_base_url set). 'grouped' = separate overlay_hits/baseline_hits arrays in response."),
include_metadata: z.boolean().optional().describe("When true, search/get responses include a metadata object with full parsed frontmatter. Default: false."),
+ include_full_baseline: z.boolean().optional().describe("Search-Corpus Boundary opt-in (E0008.5). When knowledge_base_url is set, the search corpus defaults to overlay + required-baseline only. Pass true to restore the legacy merged corpus (overlay + full baseline). When knowledge_base_url is unset, this parameter is a no-op. Authority: klappy://canon/constraints/core-governance-baseline §'Search-Corpus Boundary'."),
section: z.string().optional().describe("For action='get': extract only the named ## section from the document. Returns section content or available sections if not found."),
sort_by: z.enum(["date", "path"]).optional().describe("For action='catalog': sort articles. 'date' returns newest first (requires frontmatter). 'path' returns all docs alphabetically, including undated."),
limit: z.number().min(1).max(500).optional().describe("For action='catalog': max articles to return when sort_by is provided. Default: 10, max: 500."),
@@ -244,6 +245,7 @@
knowledge_base_url: args.knowledge_base_url,
result_grouping: args.result_grouping,
include_metadata: args.include_metadata,
+ include_full_baseline: args.include_full_baseline,
section: args.section,
sort_by: args.sort_by,
limit: args.limit,
@@ -325,6 +327,7 @@
knowledge_base_url: z.string().optional().describe("Optional: GitHub repo URL for your knowledge base. When set, strict mode is automatic: missing files fall through to the bundled governance tier."),
result_grouping: z.enum(["merged", "overlay_first", "grouped"]).optional().describe("Controls how overlay (knowledge_base) and baseline results are ordered. 'merged' = pure score order (default when knowledge_base_url unset). 'overlay_first' = overlay docs ranked above baseline (default when knowledge_base_url set). 'grouped' = separate overlay_hits/baseline_hits arrays in response."),
include_metadata: z.boolean().optional().describe("When true, each hit includes a metadata object with full parsed frontmatter. Default: false."),
+ include_full_baseline: z.boolean().optional().describe("Search-Corpus Boundary opt-in (E0008.5). When knowledge_base_url is set, the search corpus defaults to overlay + required-baseline only. Pass true to restore the legacy merged corpus (overlay + full baseline). When knowledge_base_url is unset, this is a no-op. Authority: klappy://canon/constraints/core-governance-baseline §'Search-Corpus Boundary'."),
},
annotations: { readOnlyHint: true, destructiveHint: false, idempotentHint: true, openWorldHint: true },
},
@@ -370,6 +373,7 @@
limit: z.number().min(1).max(500).optional().describe("Max articles to return when sort_by is provided. Default: 10, max: 500."),
offset: z.number().min(0).optional().describe("Skip this many articles before returning results. Use with limit for pagination. Default: 0."),
filter_epoch: z.string().optional().describe("Filter to articles with this epoch value in frontmatter (e.g. 'E0007')."),
+ include_full_baseline: z.boolean().optional().describe("Search-Corpus Boundary opt-in (E0008.5). When knowledge_base_url is set, the catalog reflects overlay + required-baseline only. Pass true to restore the legacy merged catalog (overlay + full baseline). Authority: klappy://canon/constraints/core-governance-baseline §'Search-Corpus Boundary'."),
},
annotations: { readOnlyHint: true, destructiveHint: false, idempotentHint: true, openWorldHint: true },
},
@@ -390,6 +394,7 @@
input: z.string().describe("Description of what you're about to implement."),
knowledge_base_url: z.string().optional().describe("Optional: GitHub repo URL for your knowledge base. When set, strict mode is automatic: missing files fall through to the bundled governance tier."),
result_grouping: z.enum(["merged", "overlay_first", "grouped"]).optional().describe("Controls how overlay (knowledge_base) and baseline start_here results are ordered. 'merged' = pure score order (default when knowledge_base_url unset). 'overlay_first' = overlay docs ranked above baseline (default when knowledge_base_url set). 'grouped' = separate start_here_overlay/start_here_baseline arrays."),
+ include_full_baseline: z.boolean().optional().describe("Search-Corpus Boundary opt-in (E0008.5). When knowledge_base_url is set, the preflight corpus defaults to overlay + required-baseline only. Pass true to restore the legacy merged corpus (overlay + full baseline). When knowledge_base_url is unset, this is a no-op. Authority: klappy://canon/constraints/core-governance-baseline §'Search-Corpus Boundary'."),
},
annotations: { readOnlyHint: true, destructiveHint: false, idempotentHint: true, openWorldHint: true },
},
@@ -438,6 +443,7 @@
knowledge_base_url: args.knowledge_base_url as string | undefined,
result_grouping: args.result_grouping as "merged" | "overlay_first" | "grouped" | undefined,
include_metadata: args.include_metadata as boolean | undefined,
+ include_full_baseline: args.include_full_baseline as boolean | undefined,
section: args.section as string | undefined,
sort_by: args.sort_by as string | undefined,
limit: args.limit as number | undefined,
diff --git a/workers/src/orchestrate.ts b/workers/src/orchestrate.ts
--- a/workers/src/orchestrate.ts
+++ b/workers/src/orchestrate.ts
@@ -15,6 +15,7 @@
type Env,
type BaselineIndex,
type IndexEntry,
+ type SearchScope,
type SectionResult,
} from "./zip-baseline-fetcher";
import { buildBM25Index, searchBM25, tokenize, type BM25Index } from "./bm25";
@@ -235,6 +236,14 @@
knowledge_base_url?: string;
result_grouping?: ResultGrouping;
include_metadata?: boolean;
+ /**
+ * Search-Corpus Boundary opt-in (E0008.5). When `knowledge_base_url` is set,
+ * the search corpus defaults to overlay + required-baseline-manifest only.
+ * Set this to true to restore the legacy merged corpus (overlay + full
+ * baseline). When `knowledge_base_url` is unset, this parameter is a no-op.
+ * Authority: klappy://canon/constraints/core-governance-baseline §"Search-Corpus Boundary".
+ */
+ include_full_baseline?: boolean;
section?: string;
sort_by?: string;
limit?: number;
@@ -1348,9 +1357,10 @@
state?: OddkitState,
includeMetadata?: boolean,
resolvedGrouping: ResultGrouping = "merged",
+ searchScope: SearchScope = "merged",
): Promise<ActionResult> {
const startMs = Date.now();
- const index = await fetcher.getIndex(knowledgeBaseUrl);
+ const index = await fetcher.getIndex(knowledgeBaseUrl, searchScope);
const bm25 = getBM25Index(index.entries);
// Issue #150 fix-forward: when grouping is active, retrieve a wider candidate
@@ -1412,6 +1422,9 @@
baseline_url: index.baseline_url,
knowledge_base_url: knowledgeBaseUrl,
search_index_size: bm25.N,
+ search_scope: index.search_scope,
+ overlay_doc_count: index.stats.canon,
+ baseline_doc_count: index.stats.baseline_indexed ?? index.stats.baseline,
result_grouping: resolvedGrouping,
duration_ms: Date.now() - startMs,
generated_at: new Date().toISOString(),
@@ -1499,6 +1512,9 @@
baseline_url: index.baseline_url,
knowledge_base_url: knowledgeBaseUrl,
search_index_size: bm25.N,
+ search_scope: index.search_scope,
+ overlay_doc_count: index.stats.canon,
+ baseline_doc_count: index.stats.baseline_indexed ?? index.stats.baseline,
result_grouping: resolvedGrouping,
duration_ms: Date.now() - startMs,
generated_at: new Date().toISOString(),
@@ -2247,9 +2263,10 @@
knowledgeBaseUrl?: string,
state?: OddkitState,
options?: { sort_by?: string; limit?: number; offset?: number; filter_epoch?: string },
+ searchScope: SearchScope = "merged",
): Promise<ActionResult> {
const startMs = Date.now();
- const index = await fetcher.getIndex(knowledgeBaseUrl);
+ const index = await fetcher.getIndex(knowledgeBaseUrl, searchScope);
const { sort_by, limit: rawLimit, offset: rawOffset, filter_epoch } = options || {};
const effectiveLimit = Math.min(Math.max(rawLimit || 10, 1), 500);
const effectiveOffset = Math.max(rawOffset || 0, 0);
@@ -2315,10 +2332,16 @@
}));
}
+ const baselineCount = index.stats.baseline_indexed ?? index.stats.baseline;
+ const scopeNote =
+ index.search_scope === "kb_with_required_baseline"
+ ? ` [scoped: required-baseline only; pass include_full_baseline=true to merge]`
+ : "";
+
const assistantTextParts = [
`ODD Documentation Catalog`,
``,
- `Total: ${index.stats.total} docs (${index.stats.canon} canon, ${index.stats.baseline} baseline)`,
+ `Total: ${index.stats.total} docs (${index.stats.canon} canon, ${baselineCount} baseline)${scopeNote}`,
knowledgeBaseUrl ? `Canon override: ${knowledgeBaseUrl}` : "",
``,
`Start here:`,
@@ -2352,7 +2375,8 @@
const result: Record<string, unknown> = {
total: index.stats.total,
canon: index.stats.canon,
- baseline: index.stats.baseline,
+ baseline: baselineCount,
+ baseline_total: index.stats.baseline,
categories: Object.keys(byTag),
start_here: startHere.map((e) => e.path),
};
@@ -2376,6 +2400,9 @@
debug: {
knowledge_base_url: knowledgeBaseUrl,
baseline_url: index.baseline_url,
+ search_scope: index.search_scope,
+ overlay_doc_count: index.stats.canon,
+ baseline_doc_count: index.stats.baseline_indexed ?? index.stats.baseline,
generated_at: new Date().toISOString(), // response time — consistent with all other handlers
index_built_at: index.generated_at, // preserve cache-freshness diagnostic under accurate name
duration_ms: Date.now() - startMs,
@@ -2389,9 +2416,10 @@
knowledgeBaseUrl?: string,
state?: OddkitState,
resolvedGrouping: ResultGrouping = "merged",
+ searchScope: SearchScope = "merged",
): Promise<ActionResult> {
const startMs = Date.now();
- const index = await fetcher.getIndex(knowledgeBaseUrl);
+ const index = await fetcher.getIndex(knowledgeBaseUrl, searchScope);
const topic = message.replace(/^preflight:\s*/i, "").trim();
// Score all entries, then apply partition before slicing
@@ -2453,6 +2481,9 @@
debug: {
docs_considered: index.entries.length,
knowledge_base_url: knowledgeBaseUrl,
+ search_scope: index.search_scope,
+ overlay_doc_count: index.stats.canon,
+ baseline_doc_count: index.stats.baseline_indexed ?? index.stats.baseline,
result_grouping: resolvedGrouping,
duration_ms: Date.now() - startMs,
generated_at: new Date().toISOString(),
@@ -3333,7 +3364,7 @@
] as const;
export async function handleUnifiedAction(params: UnifiedParams): Promise<OddkitEnvelope> {
- const { action, input, context, mode, knowledge_base_url, result_grouping, include_metadata, section, sort_by, limit, offset, filter_epoch, state, env, tracer } = params;
+ const { action, input, context, mode, knowledge_base_url, result_grouping, include_metadata, include_full_baseline, section, sort_by, limit, offset, filter_epoch, state, env, tracer } = params;
// Conditional default: when knowledge_base_url is set and caller didn't
// specify result_grouping, default to "overlay_first" (the fix for #150).
@@ -3341,6 +3372,16 @@
const resolvedGrouping: ResultGrouping =
result_grouping ?? (knowledge_base_url ? "overlay_first" : "merged");
+ // Search-Corpus Boundary (E0008.5): when knowledge_base_url is set, the
+ // search corpus defaults to overlay + required-baseline only. Callers opt
+ // in to the legacy merged corpus via include_full_baseline=true. When
+ // knowledge_base_url is unset, the parameter is a no-op and scope is
+ // forced to "merged" (the baseline IS the canon — there is nothing to
+ // scope away). Authority: klappy://canon/constraints/core-governance-baseline
+ // §"Search-Corpus Boundary".
+ const resolvedScope: SearchScope =
+ knowledge_base_url && !include_full_baseline ? "kb_with_required_baseline" : "merged";
+
if (!VALID_ACTIONS.includes(action as (typeof VALID_ACTIONS)[number])) {
return {
action: "error",
@@ -3371,7 +3412,7 @@
result = await runEncodeAction(input, context, fetcher, knowledge_base_url, state);
break;
case "search":
- result = await runSearch(input, fetcher, knowledge_base_url, state, include_metadata, resolvedGrouping);
+ result = await runSearch(input, fetcher, knowledge_base_url, state, include_metadata, resolvedGrouping, resolvedScope);
break;
case "get":
result = await runGet(input, fetcher, knowledge_base_url, state, include_metadata, section);
@@ -3383,13 +3424,13 @@
result = await runAudit(input, fetcher, knowledge_base_url, state);
break;
case "catalog":
- result = await runCatalog(fetcher, knowledge_base_url, state, { sort_by, limit, offset, filter_epoch });
+ result = await runCatalog(fetcher, knowledge_base_url, state, { sort_by, limit, offset, filter_epoch }, resolvedScope);
break;
case "validate":
result = await runValidate(input, state);
break;
case "preflight":
- result = await runPreflight(input, fetcher, knowledge_base_url, state, resolvedGrouping);
+ result = await runPreflight(input, fetcher, knowledge_base_url, state, resolvedGrouping, resolvedScope);
break;
case "version":
result = runVersion(env);
@@ -3398,7 +3439,7 @@
result = await runCleanupStorage(fetcher, knowledge_base_url);
break;
default:
- result = await runSearch(input, fetcher, knowledge_base_url, state, undefined, resolvedGrouping);
+ result = await runSearch(input, fetcher, knowledge_base_url, state, undefined, resolvedGrouping, resolvedScope);
}
// Inject trace into debug envelope (E0008.1)
diff --git a/workers/src/zip-baseline-fetcher.ts b/workers/src/zip-baseline-fetcher.ts
--- a/workers/src/zip-baseline-fetcher.ts
+++ b/workers/src/zip-baseline-fetcher.ts
@@ -29,6 +29,35 @@
// old code persists until the repo's commit SHA changes.
const INDEX_VERSION = "2.4"; // 2.4: Cache API migration, KV removal, x-ray tracing (E0008)
+/**
+ * Search corpus scope (E0008.5 — Search-Corpus Boundary).
+ *
+ * "merged" — overlay + full baseline (legacy default; default
+ * when knowledge_base_url is unset).
+ * "kb_with_required_baseline" — overlay + only the required-baseline manifest
+ * paths from the default baseline. Default when
+ * knowledge_base_url is set and the caller has
+ * not opted in via include_full_baseline=true.
+ *
+ * Canon authority for the boundary and the required-baseline list:
+ * klappy://canon/constraints/core-governance-baseline §"Search-Corpus Boundary"
+ *
+ * Manifest source of truth (ships in workers/baseline/MANIFEST.json):
+ * The six paths below MUST stay in sync with workers/baseline/MANIFEST.json.
+ * The manifest is the canon-anchored record; this const is the runtime filter
+ * used by the worker to avoid a JSON resolution step on the hot path.
+ */
+export type SearchScope = "merged" | "kb_with_required_baseline";
+
+export const REQUIRED_BASELINE_PATHS: ReadonlySet<string> = new Set([
+ "canon/values/orientation.md",
+ "canon/values/axioms.md",
+ "canon/meta/writing-canon.md",
+ "canon/constraints/definition-of-done.md",
+ "canon/constraints/telemetry-governance.md",
+ "odd/challenge/stakes-calibration.md",
+]);
+
export interface Env {
DEFAULT_KNOWLEDGE_BASE_URL: string;
ODDKIT_VERSION: string;
@@ -93,8 +122,20 @@
stats: {
total: number;
canon: number;
+ /** Total baseline files discovered in the default baseline repo. */
baseline: number;
+ /**
+ * Baseline files actually included in the search index, after scope filtering.
+ * Equal to `baseline` when scope is "merged"; equal to the number of
+ * required-baseline paths present when scope is "kb_with_required_baseline".
+ */
+ baseline_indexed?: number;
};
+ /**
+ * Search corpus scope under which this index was built (E0008.5).
+ * Absent on indexes built before the field was introduced.
+ */
+ search_scope?: SearchScope;
commit_sha?: string;
canon_commit_sha?: string;
}
@@ -881,11 +922,16 @@
* removing one KV read). Content-addressed by SHA — no TTL needed
* for correctness. Module cache uses 5-min TTL for freshness.
*/
- async getIndex(knowledgeBaseUrl?: string): Promise<BaselineIndex> {
+ async getIndex(knowledgeBaseUrl?: string, scope: SearchScope = "merged"): Promise<BaselineIndex> {
const baselineRepoUrl = "https://github.com/klappy/klappy.dev";
+ // Effective scope: scoping only matters when an overlay is set.
+ // When no knowledge_base_url, the baseline IS the canon — there is
+ // nothing to scope away — so force "merged" regardless of caller intent.
+ const effectiveScope: SearchScope = knowledgeBaseUrl ? scope : "merged";
+
// Step 0: Module-level memory cache (0ms, 5-min TTL)
- const expectedKey = `v${INDEX_VERSION}/${getCacheKey(knowledgeBaseUrl || "default")}`;
+ const expectedKey = `v${INDEX_VERSION}/${getCacheKey(knowledgeBaseUrl || "default")}_scope-${effectiveScope}`;
if (cachedIndex && cachedIndexKey === expectedKey && Date.now() - indexCachedAt < MODULE_CACHE_TTL_MS) {
this.tracer?.recordFetch({ url: `memory://index/${expectedKey}`, duration_ms: 0, cached: true });
return cachedIndex;
@@ -896,8 +942,10 @@
const canonRef = knowledgeBaseUrl ? extractBranchRef(knowledgeBaseUrl) : undefined;
const canonSha = knowledgeBaseUrl ? await this.getLatestCommitSha(knowledgeBaseUrl, canonRef) : undefined;
- // Content-addressed cache key: SHA + version
- const shaKey = `${baselineSha || "unknown"}_${canonSha || "none"}`;
+ // Content-addressed cache key: SHA + version + scope.
+ // Scope is part of the key so a scoped index and a merged index against
+ // the same KB do not poison each other's cached form.
+ const shaKey = `${baselineSha || "unknown"}_${canonSha || "none"}_${effectiveScope}`;
const cacheKey = `index/v${INDEX_VERSION}/${getCacheKey(knowledgeBaseUrl || "default")}_${shaKey}`;
// Step 2: Cache API (~1ms edge read) — cacheGet records the cf-cache:// fetch.
@@ -973,8 +1021,19 @@
canonEntries = await this.buildIndexFromRepo(knowledgeBaseUrl, "canon", skipCache);
}
+ // Search-Corpus Boundary (E0008.5): when scoped, restrict baseline entries
+ // to only the required-baseline manifest before arbitration. Per
+ // klappy://canon/constraints/core-governance-baseline §"Search-Corpus
+ // Boundary", this preserves required-baseline as the floor while excluding
+ // co-located canon-only content (writings/, apocrypha/, odd/ledger/, etc.)
+ // that would otherwise outrank the project KB's own canon in BM25.
+ const scopedBaselineEntries =
+ effectiveScope === "kb_with_required_baseline"
+ ? baselineEntries.filter((e) => REQUIRED_BASELINE_PATHS.has(e.path))
+ : baselineEntries;
+
// Arbitrate — canon overrides baseline
- const allEntries = this.arbitrateEntries(canonEntries, baselineEntries);
+ const allEntries = this.arbitrateEntries(canonEntries, scopedBaselineEntries);
const index: BaselineIndex = {
version: INDEX_VERSION,
@@ -986,7 +1045,9 @@
total: allEntries.length,
canon: canonEntries.length,
baseline: baselineEntries.length,
+ baseline_indexed: scopedBaselineEntries.length,
},
+ search_scope: effectiveScope,
commit_sha: baselineSha || undefined,
canon_commit_sha: canonSha || undefined,
};
diff --git a/workers/test/canon-tool-envelope.smoke.mjs b/workers/test/canon-tool-envelope.smoke.mjs
--- a/workers/test/canon-tool-envelope.smoke.mjs
+++ b/workers/test/canon-tool-envelope.smoke.mjs
@@ -717,6 +717,99 @@
typeof catalogResult.debug?.index_built_at === "string",
`got: ${catalogResult.debug?.index_built_at}`);
+ // ── Search-Corpus Boundary (E0008.5) ───────────────────────────────────────
+ // Asserts that when knowledge_base_url is set, the default scope filters the
+ // baseline to required-baseline only; that include_full_baseline=true
+ // restores the merged corpus; and that envelope fields surface scope.
+ // Authority: klappy://canon/constraints/core-governance-baseline §"Search-Corpus Boundary"
+ console.log(`\n─── Search-Corpus Boundary: catalog default scope ───`);
+ const PTXPRINT_KB = "https://github.com/klappy/ptxprint-mcp";
+ const scopedCatalog = await callTool("oddkit_catalog", { knowledge_base_url: PTXPRINT_KB });
+ expectFullEnvelope("oddkit_catalog (scoped)", scopedCatalog);
+
+ ok(`scoped catalog: debug.search_scope === "kb_with_required_baseline"`,
+ scopedCatalog.debug?.search_scope === "kb_with_required_baseline",
+ `got: ${scopedCatalog.debug?.search_scope}`);
+ ok(`scoped catalog: debug.overlay_doc_count present and > 0`,
+ typeof scopedCatalog.debug?.overlay_doc_count === "number" && scopedCatalog.debug.overlay_doc_count > 0,
+ `got: ${scopedCatalog.debug?.overlay_doc_count}`);
+ ok(`scoped catalog: debug.baseline_doc_count <= 6 (required-baseline ceiling)`,
+ typeof scopedCatalog.debug?.baseline_doc_count === "number" && scopedCatalog.debug.baseline_doc_count <= 6,
+ `got: ${scopedCatalog.debug?.baseline_doc_count}`);
+ ok(`scoped catalog: result.baseline reflects scoped count (= debug.baseline_doc_count)`,
+ typeof scopedCatalog.result?.baseline === "number" &&
+ scopedCatalog.result.baseline === scopedCatalog.debug?.baseline_doc_count,
+ `result.baseline=${scopedCatalog.result?.baseline} debug.baseline_doc_count=${scopedCatalog.debug?.baseline_doc_count}`);
+ ok(`scoped catalog: result.baseline_total >= result.baseline (full repo count disclosed)`,
+ typeof scopedCatalog.result?.baseline_total === "number" &&
+ scopedCatalog.result.baseline_total >= scopedCatalog.result.baseline,
+ `baseline_total=${scopedCatalog.result?.baseline_total} baseline=${scopedCatalog.result?.baseline}`);
+
+ console.log(`\n─── Search-Corpus Boundary: catalog include_full_baseline opt-in ───`);
+ const mergedCatalog = await callTool("oddkit_catalog", {
+ knowledge_base_url: PTXPRINT_KB,
+ include_full_baseline: true,
+ });
+ expectFullEnvelope("oddkit_catalog (merged)", mergedCatalog);
+
+ ok(`merged catalog: debug.search_scope === "merged"`,
+ mergedCatalog.debug?.search_scope === "merged",
+ `got: ${mergedCatalog.debug?.search_scope}`);
+ ok(`merged catalog: baseline_doc_count is full baseline (much greater than scoped)`,
+ typeof mergedCatalog.debug?.baseline_doc_count === "number" &&
+ mergedCatalog.debug.baseline_doc_count > (scopedCatalog.debug?.baseline_doc_count ?? 0) + 50,
+ `merged=${mergedCatalog.debug?.baseline_doc_count} scoped=${scopedCatalog.debug?.baseline_doc_count}`);
+
+ console.log(`\n─── Search-Corpus Boundary: search default scope ───`);
+ // Negative-control query: this term lives only in klappy.dev's canon, not
+ // ptxprint-mcp's. Under scoped default, klappy.dev hits must NOT surface.
+ const scopedSearch = await callTool("oddkit_search", {
+ input: "release validation gate Bugbot Sonnet validator",
+ knowledge_base_url: PTXPRINT_KB,
+ });
+ expectFullEnvelope("oddkit_search (scoped, klappy.dev-only term)", scopedSearch);
+
+ ok(`scoped search: debug.search_scope === "kb_with_required_baseline"`,
+ scopedSearch.debug?.search_scope === "kb_with_required_baseline",
+ `got: ${scopedSearch.debug?.search_scope}`);
+ ok(`scoped search: debug.search_index_size <= overlay_count + 6`,
+ typeof scopedSearch.debug?.search_index_size === "number" &&
+ typeof scopedSearch.debug?.overlay_doc_count === "number" &&
+ scopedSearch.debug.search_index_size <= scopedSearch.debug.overlay_doc_count + 6,
+ `index_size=${scopedSearch.debug?.search_index_size} overlay=${scopedSearch.debug?.overlay_doc_count}`);
+ // Klappy.dev release-validation-gate doc must NOT appear in scoped hits.
+ const scopedHitPaths = (scopedSearch.result?.hits || []).map((h) => h.path || "");
+ const leakedReleaseGate = scopedHitPaths.some((p) =>
+ p.includes("canon/constraints/release-validation-gate"),
+ );
+ ok(`scoped search: klappy.dev-only doc 'release-validation-gate' does NOT leak into hits`,
+ !leakedReleaseGate,
+ `leak detected in: ${scopedHitPaths.join(", ")}`);
+
+ console.log(`\n─── Search-Corpus Boundary: search include_full_baseline opt-in ───`);
+ const mergedSearch = await callTool("oddkit_search", {
+ input: "release validation gate Bugbot Sonnet validator",
+ knowledge_base_url: PTXPRINT_KB,
+ include_full_baseline: true,
+ });
+ expectFullEnvelope("oddkit_search (merged)", mergedSearch);
+
+ ok(`merged search: debug.search_scope === "merged"`,
+ mergedSearch.debug?.search_scope === "merged",
+ `got: ${mergedSearch.debug?.search_scope}`);
+ ok(`merged search: search_index_size strictly greater than scoped`,
+ typeof mergedSearch.debug?.search_index_size === "number" &&
+ mergedSearch.debug.search_index_size > (scopedSearch.debug?.search_index_size ?? 0),
+ `merged=${mergedSearch.debug?.search_index_size} scoped=${scopedSearch.debug?.search_index_size}`);
+
+ console.log(`\n─── Search-Corpus Boundary: search no-KB is no-op ───`);
+ // When knowledge_base_url is unset, the parameter must be a no-op and scope
+ // must be "merged" (the baseline IS the canon).
+ const defaultSearch = await callTool("oddkit_search", { input: "axioms" });
+ ok(`default search (no KB): debug.search_scope === "merged"`,
+ defaultSearch.debug?.search_scope === "merged",
+ `got: ${defaultSearch.debug?.search_scope}`);
+
console.log(`\n${passed} passed, ${failed} failed`);
process.exit(failed === 0 ? 0 : 1);
}You can send follow-ups to the cloud agent here.
Reviewed by Cursor Bugbot for commit b533bfc. Configure here.
Independent RV-gate validator — VERDICT: APPROVE_WITH_NOTESPer Validator: Claude Sonnet 4.6 in a fresh Managed Agent session (no shared context with the implementing session). Session ID: All nine validation questions answered ✅:
Non-blocking noteThe receipts table in the PR body has numeric drift vs. current live state. Full DOLCHE ledgerClick to expand the validator's complete DOLCHE recordPR #153 Validator DOLCHE LedgerValidator: Independent fresh-context session (Claude Sonnet 4.5) Q1 — Implementation Matches Canon Contract[O] PR #153 implements E0008.5 (Search-Corpus Boundary). Two commits: [O] Default scope when [O] [O] When [O]
[O] VERDICT Q1: PASS Q2 — Cache Key Includes Scope[O] Cache key in This is VERDICT Q2: PASS Q3 — Runtime Invariant #5 Preserved[O] [O] Scoping is applied only to the const scopedBaselineEntries =
effectiveScope === "kb_with_required_baseline"
? baselineEntries.filter((e) => REQUIRED_BASELINE_PATHS.has(e.path))
: baselineEntries;VERDICT Q3: PASS Q4 — Per-File Resolution Unchanged[O] Grep for diff changes to [O] [O] Governance reads in these tools still use VERDICT Q4: PASS Q5 — Telemetry/Envelope Discipline[O] Debug envelope additions verified in diff:
[O] [O] Live verification (Probe A): VERDICT Q5: PASS Q6 — Live Behavior Matches Smoke Claims (Receipts Table)[O] Probe A (preview scoped catalog,
[O] Probe B (preview merged catalog,
[O] Probe C (preview scoped search, "release validation gate Bugbot Sonnet validator"):
[O] Probe D (prod v0.26.0, same KB):
[O] Receipts table discrepancy (NOTE): PR body claimed VERDICT Q6: PASS (with note on stale receipts numbers) Q7 — Smoke Test Passes[O] Command: Result: 260 passed, 0 failed New E0008.5 tests (lines 720-812) exercised:
VERDICT Q7: PASS — 260 passed, 0 failed Q8 — Typecheck Passes[O] Command: Output: Exit code: 0. No errors. ✅ VERDICT Q8: PASS — 0 TypeScript errors Q9 — No Unrelated Changes[O] Full diff stat: 7 files, 233 insertions, 17 deletions. Files changed:
[O] Grep for unrelated changes (orchestrate.ts diff lines not mentioning scope/baseline/E0008.5): empty — every changed line in orchestrate.ts is directly E0008.5 scope-related. VERDICT Q9: PASS — no unrelated changes Final Verdict[D] VERDICT: APPROVE_WITH_NOTES All nine validation questions answered affirmatively with evidence. The implementation correctly implements E0008.5 as specified in canon §"Search-Corpus Boundary". One note: NOTE (non-blocking): The PR body receipts table has numeric drift vs. current live state. The Evidence summary:
Cleared to merge. |

Summary
Default-scopes the search corpus to overlay + required-baseline when
knowledge_base_urlis set. Addsinclude_full_baseline: trueas the explicit opt-in to the prior merged behavior. Closes the visibility gap measured againstklappy/ptxprint-mcpand operationalizes the canon contract added in klappy/klappy.dev#155 (merged 2026-04-29).This is the code-side of E0008.5 (Search-Corpus Boundary, Project-KB Visibility). Canon merged first per the operator's directive.
Affected actions:
oddkit_search,oddkit_catalog,oddkit_preflight.orient,get,challenge,validate,gate,encode,auditare unchanged — they read governance via the per-file resolver, not the search index.Smoke results (preview)
ODDKIT_URL=https://feat-kb-scope-isolation-oddkit.klappy.workers.dev/mcp node workers/test/canon-tool-envelope.smoke.mjs12 new Search-Corpus Boundary assertions added; full pre-existing suite still green.
Direct probes against
klappy/ptxprint-mcp(preview vs. prod)include_full_baseline:truerelease-validation-gate.mdin negative-control top hitsCache key carries the scope dimension — preview trace shows distinct entries
..._kb_with_required_baselineand..._mergedagainst the same KB SHA. They do not poison each other.(Overlay count reflects ptxprint-mcp's current canon, which grew during the day; the previous 21 was measured this morning. The contamination shape is unchanged.)
What changed
workers/baseline/MANIFEST.json— required-baseline manifest, six paths, points at canon §"Required in Baseline." Closes Build-Time Invariant Update docs with production domain oddkit.klappy.dev #4.workers/src/zip-baseline-fetcher.ts—SearchScopetype ("merged" | "kb_with_required_baseline")REQUIRED_BASELINE_PATHS: ReadonlySet<string>(synced withMANIFEST.json)BaselineIndex.search_scopeandstats.baseline_indexedsurfacedgetIndex(knowledgeBaseUrl, scope?)filters baseline against the manifest when scopedworkers/src/orchestrate.ts—UnifiedParams.include_full_baseline?: booleanresolvedScopederived at dispatch and threaded intorunSearch/runCatalog/runPreflightsearch_scope,overlay_doc_count,baseline_doc_countbaseline_totalworkers/src/index.ts—include_full_baselineadded to the unifiedoddkitschema and tooddkit_search/oddkit_catalog/oddkit_preflightworkers/test/canon-tool-envelope.smoke.mjs— 12 live-smoke assertions covering scoped default, opt-in to merged, leak prevention against ptxprint-mcp, and no-KB no-opgovernance-change-discipline.mdWhat is NOT changed
baseline path is never user-configurable). The baseline floor is unchanged; only the search index is scope-sensitive.oddkit_get, governance reads).klappy://canon/principles/scoped-truthstill resolves; it just stops competing for ranking slots in a project KB's own searches.orientbehavior (per the canon's "Affected Tools" table).knowledge_base_url,effectiveScopeis forced tomergedand behavior is identical to today.result_groupingfrom Feature Request: Isolate or re-rank knowledge base content vs baseline in search corpus #150). That is a separate, surgical change once a schema decision is made.Release validation gate
This PR touches
orchestrate.ts(governance reads, envelope behavior) andzip-baseline-fetcher.ts(index composition). Perklappy://canon/constraints/release-validation-gate(tier 1), promotion to prod requires:completed(notin_progress). Will wait.Re-validation against prod will run the same three probes (catalog total, search index size, negative-control leak) after promotion to confirm parity with preview.
Linked
core-governance-baseline.md.klappy/ptxprint-mcpcanon/handoffs/oddkit-kb-isolation-feature-request.md.klappy://canon/principles/scoped-truth,klappy://canon/principles/dry-canon-says-it-once.Note
Medium Risk
Changes the default document corpus and caching behavior for key discovery actions when
knowledge_base_urlis used, which can materially change results and counts for clients. Scope is opt-in reversible, but any mismatch between the manifest/constant paths or cache keying could cause surprising indexing output.Overview
When
knowledge_base_urlis set, the worker now defaults the searchable corpus to the overlay knowledge base plus a required-baseline subset (instead of overlay + full baseline) forsearch,catalog, andpreflight, and introducesinclude_full_baseline=trueto explicitly opt back into the legacy merged corpus.This adds a required-baseline manifest (
workers/baseline/MANIFEST.json), implements scoped indexing and scope-aware cache keys inKnowledgeBaseFetcher.getIndex, threads the scope through the unified/individual tool schemas, and surfaces new debug/result fields (search_scope,overlay_doc_count,baseline_doc_count,baseline_total) with expanded smoke tests asserting leak-prevention and opt-in behavior. Versions are bumped to0.27.0.Reviewed by Cursor Bugbot for commit 8e88a9f. Bugbot is set up for automated code reviews on this repo. Configure here.