perf(cypher): inline result strings via CompactString + fix path-param BM25 matching#596
Merged
Conversation
Value::Str and NodeRef.{name,kind} held String, so every cypher result
row heap-allocated its strings — including the highest-frequency columns,
symbol `kind` ("Function"/"Class"/…) and `rel_type` ("Calls"/"Implements"/…),
which are always ≤24 bytes. Switching to compact_str::CompactString inlines
those (no heap), so a query returning N edges drops ~N allocs per such column.
Verified: all 23 representative result strings (every NodeKind/RelType name +
typical symbol identifiers) inline with zero heap allocation under the
24-byte inline budget. file_path stays String (paths routinely exceed 24B).
- value.rs: Value::Str(CompactString), NodeRef.name/kind: CompactString
- executor.rs: construction sites use `.into()` (From<&str>/From<String>)
- ecp-core: enable compact_str `serde` feature (CLI serializes Value via json!)
- test collects pull `.to_string()` to keep Vec<String> assertions intact
compact_str is already an ecp-core dependency; no new crate added.
bm25_search escaped only `:`, but a consumer contract_id like
`http:GET:/users/{id}` still contains `{` `}` — Tantivy query-parser
operators. parse_query raised a SyntaxError, so bm25_search returned an
empty result set for EVERY REST path-param contract, silently dropping a
whole class of cross-service links (no warning, looked like "no match").
normalize_for_bm25 now maps any non-alphanumeric (keeping `_`) to a space
on BOTH index and query sides, matching the default tokenizer's splitting
and keeping the two symmetric. Measured on a representative provider/consumer
corpus: truth-recall 4/7 → 7/7 (the 3 path-param contracts now match).
Regression tests assert path-param contract_ids parse, operators map to
spaces, and index/query normalization is identical.
Contributor
ecp impact cache (0 symbols) — internal, used by
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Two best-practice wins from the FU-2026-06-23-001 library-usage audit. compact_str is already an ecp-core dependency — no new crate.
1. perf(cypher): inline short result strings via CompactString
Value::StrandNodeRef.{name,kind}heldString, heap-allocating every cypher result string — including the highest-frequency columns, symbolkind("Function"/"Class"…) andrel_type("Calls"/"Implements"…), which are always ≤24 bytes. Switching toCompactStringinlines those (no heap), so a query returning N edges drops ~N allocs per such column.Verified no regression:
CompactString=String= 24 bytes;NodeRefpayload 80 = 80 bytes → zero enum bloat, identicalVec<Vec<Value>>footprint..into()≤.to_string()cost always (≤24B inline-copies, no alloc; >24B one alloc, same as before).cmp/Hash/Debug/serde all delegate toas_str()→ DISTINCT, ORDER BY, JSON output byte-identical (cross-checked by 5 review angles).file_pathstaysString(paths exceed the inline budget).2. fix(group): normalize Tantivy query operators in contract BM25 matching
bm25_searchescaped only:, but a consumercontract_idlikehttp:GET:/users/{id}still contains{}— Tantivy query-parser operators.parse_queryraised a SyntaxError, so every REST path-param contract silently returned no BM25 match — a whole class of cross-service links dropped with no warning.normalize_for_bm25now maps any non-alphanumeric (keeping_) to a space on both index and query sides, matching the default tokenizer's word splitting. Measured on a representative corpus: truth-recall 4/7 → 7/7.Tests: path-param contract_ids parse; operators map to spaces; end-to-end — a path-param consumer BM25-matches its same-id provider across repos (exercises the real index→query pipeline, which is what symmetry must guarantee).
Audit spikes (not shipped here)
access_uncheckedislen - size_of+ ptr-cast (no entry walk); PR perf: skip redundant rkyv re-validation + tighten contracts BM25 schema #590 already captured the win → wontfix.Test
cargo test -p ecp-core -p egent-code-plexus --tests— all green, 0 failedcargo clippy -p ecp-core -p egent-code-plexus --tests— clean