fix(error-lookup): suppress weak catalog when semantic returned useful results + content-thin chunk filter by critesjosh · Pull Request #22 · AztecProtocol/mcp-server

critesjosh · 2026-05-04T16:12:23Z

Summary

Companion to critesjosh/docsgpt-aztec#66 (which filters content-thin apiref chunks server-side). This PR adds the matching client-side fixes:

Defense-in-depth filter that drops path-only / empty-body semantic chunks even if the connected DocsGPT instance hasn't been updated.
Catalog suppression that hides weak fuzzy hints from the rendered output when semantic returned content-bearing results.

Background

The v1.21 dogfood test reported aztec_lookup_error("note already nullified") as "the same bogus result". Empirical investigation showed the threshold fix from PR #20 was working — semantic was firing — but two compounding issues made the response look broken:

Semantic returned 3 chunks; 2 of them were just file paths (note_existence_request.nr, utils.nr) with no body content. The user saw an apparently-empty ## Related Documentation section.
The score-54 Contract already initialized catalog hint stayed visible under ## Lower-Confidence Catalog Hints. The user remembered it from v1.20 and concluded "unchanged".

Fix

Part 1 — `isUsefulSemanticChunk` filter

New helper in src/tools/error-lookup.ts. Mirrors the Python helper _is_empty_apiref_chunk in docsgpt: drops chunks whose body — after stripping the rendered file-path heading — is empty or path-only. Critically: legitimate signature-only chunks survive. Filter inspects content shape (whitespace presence), not length.

Defense-in-depth because:

The MCP server can be pointed at any DocsGPT instance via API_URL. A fork or older instance may not have the server-side filter.
Future ingest regressions could reintroduce path-only chunks.

When all returned chunks are path-only, lookupAztecError reports semanticHealth: "no_results" rather than "ok" with three useless paths.

Part 2 — `suppressWeakCatalog` in the formatter

New flag in formatErrorLookupResult. Behavior matrix:

catalog state	semantic state	rendered
strong (≥70)	any	catalog as `## Known Errors`
weak only	useful results	semantic only — catalog suppressed
weak only	no_results / failed / skipped	semantic absent, catalog as `## Lower-Confidence Catalog Hints`
empty	useful results	semantic only
empty	no_results / etc.	"no matches found" message

When semantic gave us substance, the weak hint is pure noise the user keeps anchoring on — hide it. When semantic was unhelpful, the weak hint stays visible (it's the user's only signal) with a neutral "low-confidence cues only" note.

The catalog is still present in result.catalogMatches for programmatic consumers that need every signal — only the rendered output is filtered.

Test plan

npm run build (tsc) — clean
npx vitest run — 282/282 (was 264; +18 new cases)
- isUsefulSemanticChunk regression: path-only / md-heading-only / completely empty / signature-bearing (pub fn poseidon) / doc-comment-bearing / pub struct / multi-line path re-exports
- lookupAztecError integration: all-path-only → no_results, mixed → only useful chunks surface
- Suppression matrix: weak + semantic-ok hides catalog; weak + every other state keeps it visible
- Strong catalog matches always render normally
After release: re-run aztec_lookup_error("note already nullified") against the updated docsgpt + this MCP version. Expected output: ## Related Documentation with substantive chunks (post docsgpt#66 filtering); no Contract already initialized mention; clean message.

Companion docsgpt PR

Server-side filter: critesjosh/docsgpt-aztec#66. Order doesn't matter for shipping — either side independently improves the UX, both together close the loop.

🤖 Generated with Claude Code

…k catalog when semantic ran Companion to docsgpt's apiref-empty-chunk filter (critesjosh/docsgpt-aztec#66). Two layered changes: 1. **Client-side defense-in-depth filter** (`isUsefulSemanticChunk` in `src/tools/error-lookup.ts`). Mirrors the Python helper `_is_empty_apiref_chunk` in docsgpt's `/api/search`: drops chunks whose body — after stripping the rendered file-path heading — is empty or path-only (every line contains `/` and no whitespace). Defense-in-depth because the MCP server can be pointed at any DocsGPT deployment via `API_URL`; a fork or older instance may not have the server-side filter, and a future ingest regression could reintroduce path-only chunks. Critically: legitimate signature-only chunks survive. Filter inspects content shape (whitespace presence in remaining lines), not length — `pub fn poseidon(input: [Field; N]) -> Field` has spaces, so it never trips the path-only test. When all returned chunks are path-only, `lookupAztecError` now reports `semanticHealth: "no_results"` (semantically accurate: the backend ran cleanly but didn't return anything useful) rather than "ok" with three useless paths. 2. **Suppress weak catalog hints when semantic was useful** (`formatErrorLookupResult` in `src/utils/format.ts`). The user- reported anchoring failure: when semantic returns content-bearing chunks AND every catalog match is below the strong-match threshold, the catalog hits are pure noise — the user keeps reading them as "the primary answer" even though semantic gave us the actual answer. New `suppressWeakCatalog` flag hides the catalog section entirely from rendered output in that case. They remain in `result.catalogMatches` for programmatic consumers needing every signal. When semantic was unhelpful (no_results / failed / version mismatch / no client) the weak catalog is KEPT — it's the user's only signal. The "Lower-Confidence Catalog Hints" header + neutral "treat as low-confidence cues only" note frame it honestly. Tests: 282/282 (was 264, +18 across error-lookup + format suites). - `isUsefulSemanticChunk` regression cases: path-only / md-heading- only / completely empty / signature-bearing / doc-comment-bearing / multi-line path re-exports. - `lookupAztecError` integration: all-path-only chunks → no_results, mixed chunks → only useful ones surface. - Suppression matrix: weak + semantic-ok hides catalog; weak + every other state keeps it visible. - Strong catalog matches always render normally regardless of semantic state. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Codex review feedback. Two related issues: 1. Sourceish set used `match.source` and `match.title` to detect a rendered file-path heading line. But `/api/search` rewrites `source` to a public URL (`_aztec_source_url` produces e.g. `https://github.com/.../foo.nr`), so the bare-path heading `aztec-nr/.../foo.nr` never matched the URL — the heading was never stripped, the chunk fell through to the path-shape check which also missed because `# foo/bar.nr` contains whitespace from the markdown marker. Result: a class of empty chunks slipping through both gates. 2. The mitigation — strip a leading `#+ ` from each line before the path-shape predicate — makes the metadata coupling unnecessary. Drop the sourceish comparison entirely. New helper `lineIsPathShaped` strips heading markers, then checks "contains `/` and no whitespace". Real signature lines always have whitespace (`pub fn ...`, `struct ...`, `pub use a::b;`), so they never trip the predicate. Equivalent fix on the docsgpt side: critesjosh/docsgpt-aztec#66 gets the same shape-only simplification. New regression test: chunk with `#`-prefixed heading body and a URL-rewritten source field — the exact failure mode codex described — is correctly identified as "no useful results". 283/283 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-04T16:37:47Z

🎉 This PR is included in version 1.21.1 🎉

The release is available on:

Your semantic-release bot 📦🚀

critesjosh and others added 2 commits May 4, 2026 16:11

critesjosh merged commit 80f17fe into main May 4, 2026
6 checks passed

critesjosh deleted the fix/error-lookup-suppress-weak-when-semantic-useful branch May 4, 2026 16:36

github-actions Bot added the released label May 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(error-lookup): suppress weak catalog when semantic returned useful results + content-thin chunk filter#22

fix(error-lookup): suppress weak catalog when semantic returned useful results + content-thin chunk filter#22
critesjosh merged 2 commits intomainfrom
fix/error-lookup-suppress-weak-when-semantic-useful

critesjosh commented May 4, 2026

Uh oh!

Uh oh!

github-actions Bot commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

critesjosh commented May 4, 2026

Summary

Background

Fix

Part 1 — isUsefulSemanticChunk filter

Part 2 — suppressWeakCatalog in the formatter

Test plan

Companion docsgpt PR

Uh oh!

Uh oh!

github-actions Bot commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Part 1 — `isUsefulSemanticChunk` filter

Part 2 — `suppressWeakCatalog` in the formatter