Skip to content

chore(search,error-lookup): execFileSync for rg + trim noisy code refs#23

Merged
critesjosh merged 1 commit intomainfrom
chore/search-cleanup-and-error-ref-trim
May 4, 2026
Merged

chore(search,error-lookup): execFileSync for rg + trim noisy code refs#23
critesjosh merged 1 commit intomainfrom
chore/search-cleanup-and-error-ref-trim

Conversation

@critesjosh
Copy link
Copy Markdown
Collaborator

Summary

Two related cleanups in the search / error-lookup stack, surfaced in a review of how MCP tool output reaches the calling LLM.

src/utils/search.ts — argv-based ripgrep, no shell layer

The old code interpolated filePattern and query into a shell-command string for execSync, which created two bugs:

  • *.{nr,ts} was brace-expanded by /bin/sh and glob-expanded against the node-process cwd (not searchPath) before reaching ripgrep. Whether anything matched depended on which /bin/sh you happened to have. The same bug silently affected searchDocs with *.{md,mdx}.
  • A query that started with - (e.g. searchCode("-g")) would be reparsed by ripgrep as a flag once the shell quoting layer was gone.

This switches to argv-based execFileSync. Patterns reach rg verbatim, the query goes through -e <query> followed by a -- end-of-flags marker, and escapeShell is removed (it was the only consumer of the old shell-string form).

src/utils/error-lookup.ts — trim noisy code references

When the static catalog has fewer than three matches, lookupError falls back to grepping *.ts across aztec-packages for related code. Up to five raw hits used to flow straight through to the formatter, regularly including .test.ts, fixtures, and minified ABI/bytecode lines that didn't help anyone debug the error.

The fix over-fetches (RAW_CODE_LIMIT = 20), filters via a new isUsefulCodeRef heuristic, and caps the emitted list to 2.

  • Path-based exclusions are the primary signal: .test.ts, .spec.ts, .e2e.ts, plus segment-bounded __tests__ / tests / e2e / fixtures / mocks directories. Segment-bounded so real source dirs (attestations/, testdata/, latest/, contest/) survive untouched.
  • Content-shape exclusion is deliberately narrow: drop only lines ≥400 chars that contain a 200+ char contiguous hex run. Real generated-looking code (long enum maps, regex literals, selector tables) is preserved — better to over-include than drop a legitimate "this is the file you want" pointer.

The filter is localized to lookupError rather than searchCode because the suppression-of-semantic-fallback gate in src/tools/error-lookup.ts reads codeMatches.length, and aztec_search_code callers may legitimately want test files anyway.

Test plan

  • npm test — 306 tests pass (303 existing + 3 new test cases). New coverage:
    • search.test.ts: argv-shape assertions for single-extension and brace globs (the regression-fixing claim), -e <query> -- <path> ordering for flag-shaped queries, brace globs flowing unchanged through the manual-search fallback (globby/micromatch handles braces natively).
    • error-lookup.test.ts: isUsefulCodeRef and isMinifiedShape unit-by-unit, including negative controls for paths that contain test-like substrings (attestations/, testdata/, latest/, contest/), a long-but-not-hex-shaped generated source line that must survive, paths that begin with a test segment without a leading slash, and an integration test that pipes a synthetic rg stdout (5 tests + 1 minified + 2 real source) through lookupError and asserts both the two real refs survive and the over-fetch limit reaches rg as -m 40.
  • npm run build (tsc) — clean.

Risk

  • Behaviour change: anyone running with a /bin/sh that was accidentally leaving *.{md,mdx} alone (and so searchDocs was working) sees no change. Anyone running on a shell that was brace-expanding sees searchDocs start working. Net win.
  • escapeShell removal: only used by the old call site; verified by grep before removal.
  • isUsefulCodeRef false negatives: the heuristic prefers "keep" over "drop." Tests cover the obvious cases; the eval bucket of "user reports lookup_error gave them a useless test ref" should drop accordingly.

🤖 Generated with Claude Code

Two related cleanups in the search/error-lookup stack, surfaced in a
review of how the MCP tools format results back to the LLM.

src/utils/search.ts — switch ripgrep invocation from execSync to
execFileSync. The old code interpolated `filePattern` and `query`
into a shell-command string, so:

- `*.{nr,ts}` was brace-expanded by /bin/sh and glob-expanded against
  the node-process cwd (NOT searchPath) before reaching rg. The
  fallout depended on which `/bin/sh` you had — silently broken on
  some hosts, accidentally working on others. The same bug also
  affected `searchDocs` with `*.{md,mdx}`.
- A query that started with `-` would be reparsed by rg as a flag
  once the shell layer was removed.

The argv-based form sidesteps both. Patterns now reach rg verbatim,
and the query goes through `-e` followed by `--` so flag-shaped
queries are unambiguously the search pattern. `escapeShell` was the
only consumer of the old shell-string form and is removed.

src/utils/error-lookup.ts — when the static catalog has fewer than
three matches, lookupError falls back to grepping `*.ts` across
aztec-packages for related code. Up to five raw hits used to flow
straight through to the formatter, regularly including `.test.ts`,
fixtures, and minified ABI/bytecode lines that didn't help anyone
debug the error.

The fix over-fetches (RAW_CODE_LIMIT = 20), filters via a new
`isUsefulCodeRef` heuristic, and caps the emitted list to 2.

- Path-based exclusions are the primary signal: `.test.ts`,
  `.spec.ts`, `.e2e.ts`, plus segment-bounded `__tests__` / `tests` /
  `e2e` / `fixtures` / `mocks` directories. Segment-bounded so real
  source dirs (`attestations`, `testdata`, `latest`, `contest`)
  survive untouched.
- Content-shape exclusion is deliberately narrow: drop only lines
  ≥400 chars that contain a 200+ char contiguous hex run. Real
  generated-looking code (long enum maps, regex literals, selector
  tables) is preserved — we'd rather over-include than drop a
  legitimate "this is the file you want" pointer.

The filter is localized to lookupError rather than searchCode because
the suppression-of-semantic-fallback gate in tools/error-lookup.ts
reads `codeMatches.length`, and `aztec_search_code` callers may
legitimately want test files anyway.

Tests:
- search.test.ts asserts the argv shape (single-extension and brace
  globs both arrive as one element after `-g`), the `-e <query> --
  <path>` ordering for flag-shaped queries, and brace globs flowing
  unchanged through the manual-search fallback (globby/micromatch
  handles braces natively).
- error-lookup.test.ts covers `isUsefulCodeRef` and `isMinifiedShape`
  unit-by-unit, including negative controls for paths that contain
  test-like substrings, and an integration test that pipes a synthetic
  rg stdout (5 tests + 1 minified + 2 real source) through lookupError
  and asserts the two real refs survive — locking in both the
  filter behaviour and the over-fetch limit (`-m 40`).

All 306 tests pass; `tsc` is clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@critesjosh critesjosh merged commit b8b24f0 into main May 4, 2026
6 checks passed
@critesjosh critesjosh deleted the chore/search-cleanup-and-error-ref-trim branch May 4, 2026 21:34
critesjosh added a commit that referenced this pull request May 4, 2026
fix(search): publish execFileSync rg + lookup_error trim from #23
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 4, 2026

🎉 This PR is included in version 1.21.2 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant