fix(architecture): "overview" compact subset + aspects schema enum + server-side validation#560
Conversation
`aspects=["overview"]` silently returned only `{total_nodes, total_edges}`
because "overview" was never registered in either guard:
- `want_aspect` (src/store/store.c) skipped all DB sub-queries
- `aspect_wanted` (src/mcp/mcp.c) skipped all JSON serialization
The JSON schema (mcp.c) accepts `aspects` as free-form strings without
an enum, so "overview" was silently accepted and produced an empty
result instead of a validation error.
This 2-line patch treats "overview" as a synonym for "all" in both
guards, restoring useful output for the most natural exploratory aspect
name.
Follow-up (not in this PR): add an enum constraint to the `aspects`
JSON schema so unknown aspect strings raise a validation error instead
of returning empty.
…pects schema enum Following review feedback on the initial alias approach: making "overview" identical to "all" still produces the same ~30k-char overflow payload (file_tree alone is ~275 entries). A useful "overview" needs to be a true compact subset. Changes: 1. "overview" is now a compact subset = every aspect EXCEPT file_tree. A small static helper `aspect_in_overview(name)` (in both store.c and mcp.c) gates which aspects "overview" expands to. Replacing it with a per-aspect block-list (e.g. also dropping clusters) is trivial later. 2. The aspects parameter now has a JSON Schema enum, so unknown aspect strings raise a validation error instead of being silently accepted (the previous footgun that masked the alias-not-registered bug for so long). Behavior: aspects=["all"] → unchanged, includes file_tree aspects=["overview"] → all aspects EXCEPT file_tree, stays under cap aspects=["bogus"] → validation error (was: silent empty response) aspects omitted → unchanged (= all)
The JSON-Schema enum added in the previous commit is advisory: it only fires when the MCP client validates client-side before sending. Many clients (including Claude Code) do not, so an unknown aspect like ["bogus"] still produced a silent-empty response — exactly the footgun that hid the original alias-not-registered bug. This commit adds the missing authoritative layer: server-side validation in handle_get_architecture(). A single SSOT array VALID_ASPECTS[] backs both the enum (advisory, client-side) and aspect_is_valid() (authoritative, server-side). Any future aspect addition stays in sync by editing one array. On an unknown aspect, the server returns isError:true with a model-friendly message listing the valid values, so an agent can self-correct without re-prompting: Unknown aspect 'bogus'. Valid: all, overview, structure, dependencies, routes, languages, packages, entry_points, hotspots, boundaries, layers, file_tree, clusters. Defense in depth — clients that DO validate still get a clean client-side error; clients that don't are now caught by the server. Cleanup: properly frees project and aspects_doc on the early-error path.
|
v3 summary — the PR went through three iterations; the final state to review is the combination of both commits:
All four behaviors hand-verified after binary reload: |
|
Thanks @Kirchlive — the
A small unit test for the bogus-aspect → error path would also be welcome. 🙏 |
|
Thanks @Kirchlive — the core fix is real and well-diagnosed: A few things before it can land:
The overview fix itself is small and valuable — happy to fast-track it once DCO/lint/rebase are sorted. Thanks! |
Summary
get_architecture(aspects=["overview"])previously returned only{total_nodes, total_edges}—"overview"was never registered in either aspect guard, and the JSON schema accepted any string (no enum), so the typo / unregistered token was silently accepted and produced an empty payload.This PR turns that footgun into a useful, validated tool.
What this PR does (3 layers)
1.
"overview"is a useful compact subset, not an alias to"all"The naive fix (treat
"overview"as a synonym for"all") still produces the ~30k-char overflow payload —file_treealone is ~275 entries on a mid-size repo, pushing the response pastMAX_MCP_OUTPUT_TOKENS."overview"now expands to every aspect exceptfile_tree, gated by a small static helper present in both files:src/store/store.c(gates DB sub-queries) andsrc/mcp/mcp.c(gates JSON serialization) both use it. Extending the block-list (e.g. also dropclusters) is one line.2. JSON-Schema enum on
aspects(advisory, client-side)Self-documenting and gives well-behaved MCP clients a clean client-side error on typos.
3. Server-side validation (authoritative)
The enum alone is advisory — many MCP clients (Claude Code included) do not validate against tool schemas before sending. Verified empirically: with only the enum in place,
aspects=["bogus"]still produced silent-empty.Server-side validation closes the loop:
In
handle_get_architecture, after parsing the aspects array, each token is checked againstVALID_ASPECTS. On the first unknown token the handler returnsisError: truewith a model-friendly message listing the valid values, so an agent can self-correct without re-prompting:Defense in depth: client-side enum catches well-behaved clients early; server-side
aspect_is_validcatches everyone else.Behavior matrix
aspectsaspects=["all"]file_treeaspects=["overview"]{total_nodes,total_edges}file_tree, stays under capaspects=["hotspots"]aspects=["bogus"]{total_nodes,total_edges}isError: truewith valid-values listLocally verified
Built without libgit2 (
make -f Makefile.cbm cbm LIBGIT2_FLAGS= LIBGIT2_LIBS=), installed to~/.local/bin/codebase-memory-mcp. Hand-tests on an ~8k-node project after binary reload:["overview"]→ 13 keys, nofile_tree, comfortably under output cap["all"]→ full payload incl.file_tree(unchanged)["hotspots"]→ only hotspots (unchanged)["bogus"]→ server returns the validation error string aboveCommits
refactor(get_architecture): make "overview" a compact aspect subset + add aspects schema enumfeat(get_architecture): server-side aspect validation (defense in depth)(Both are kept separate so the subset semantics and the validation layer can be reviewed independently.)
Test plan
get_architecture(aspects=["overview"])→ rich payload withoutfile_tree, fits under the MCP token capget_architecture(aspects=["all"])→ unchanged, still includesfile_treeget_architecture(aspects=["bogus"])→isError: truewith valid-values listget_architecture(aspects=["hotspots"])→ onlyhotspotsget_architecture()(noaspects) → unchanged, returns all