Skip to content

perf(cypher): inline result strings via CompactString + fix path-param BM25 matching#596

Merged
coseto6125 merged 2 commits into
mainfrom
perf/compact-str-value
Jun 22, 2026
Merged

perf(cypher): inline result strings via CompactString + fix path-param BM25 matching#596
coseto6125 merged 2 commits into
mainfrom
perf/compact-str-value

Conversation

@coseto6125

Copy link
Copy Markdown
Owner

Two best-practice wins from the FU-2026-06-23-001 library-usage audit. compact_str is already an ecp-core dependency — no new crate.

1. perf(cypher): inline short result strings via CompactString

Value::Str and NodeRef.{name,kind} held String, heap-allocating every cypher result string — including the highest-frequency columns, symbol kind ("Function"/"Class"…) and rel_type ("Calls"/"Implements"…), which are always ≤24 bytes. Switching to CompactString inlines those (no heap), so a query returning N edges drops ~N allocs per such column.

Verified no regression:

  • CompactString = String = 24 bytes; NodeRef payload 80 = 80 bytes → zero enum bloat, identical Vec<Vec<Value>> footprint.
  • .into().to_string() cost always (≤24B inline-copies, no alloc; >24B one alloc, same as before).
  • cmp/Hash/Debug/serde all delegate to as_str() → DISTINCT, ORDER BY, JSON output byte-identical (cross-checked by 5 review angles).
  • file_path stays String (paths exceed the inline budget).
  • 23/23 representative result strings inline with zero heap allocation.

2. fix(group): normalize Tantivy query operators in contract BM25 matching

bm25_search escaped only :, but a consumer contract_id like http:GET:/users/{id} still contains { } — Tantivy query-parser operators. parse_query raised a SyntaxError, so every REST path-param contract silently returned no BM25 match — a whole class of cross-service links dropped with no warning.

normalize_for_bm25 now maps any non-alphanumeric (keeping _) to a space on both index and query sides, matching the default tokenizer's word splitting. Measured on a representative corpus: truth-recall 4/7 → 7/7.

Tests: path-param contract_ids parse; operators map to spaces; end-to-end — a path-param consumer BM25-matches its same-id provider across repos (exercises the real index→query pipeline, which is what symmetry must guarantee).

Audit spikes (not shipped here)

  • Conjunction (AND vs OR): measured identical P@1/recall — AND buys nothing for parseable contracts → not adopted.
  • rkyv self-ref cache: access_unchecked is len - size_of + ptr-cast (no entry walk); PR perf: skip redundant rkyv re-validation + tighten contracts BM25 schema #590 already captured the win → wontfix.
  • Precision follow-up (threshold/candidate-window tuning now that path-param contracts score) tracked as FU-2026-06-23-003.

Test

  • cargo test -p ecp-core -p egent-code-plexus --tests — all green, 0 failed
  • cargo clippy -p ecp-core -p egent-code-plexus --tests — clean

Value::Str and NodeRef.{name,kind} held String, so every cypher result
row heap-allocated its strings — including the highest-frequency columns,
symbol `kind` ("Function"/"Class"/…) and `rel_type` ("Calls"/"Implements"/…),
which are always ≤24 bytes. Switching to compact_str::CompactString inlines
those (no heap), so a query returning N edges drops ~N allocs per such column.

Verified: all 23 representative result strings (every NodeKind/RelType name +
typical symbol identifiers) inline with zero heap allocation under the
24-byte inline budget. file_path stays String (paths routinely exceed 24B).

- value.rs: Value::Str(CompactString), NodeRef.name/kind: CompactString
- executor.rs: construction sites use `.into()` (From<&str>/From<String>)
- ecp-core: enable compact_str `serde` feature (CLI serializes Value via json!)
- test collects pull `.to_string()` to keep Vec<String> assertions intact

compact_str is already an ecp-core dependency; no new crate added.
bm25_search escaped only `:`, but a consumer contract_id like
`http:GET:/users/{id}` still contains `{` `}` — Tantivy query-parser
operators. parse_query raised a SyntaxError, so bm25_search returned an
empty result set for EVERY REST path-param contract, silently dropping a
whole class of cross-service links (no warning, looked like "no match").

normalize_for_bm25 now maps any non-alphanumeric (keeping `_`) to a space
on BOTH index and query sides, matching the default tokenizer's splitting
and keeping the two symmetric. Measured on a representative provider/consumer
corpus: truth-recall 4/7 → 7/7 (the 3 path-param contracts now match).

Regression tests assert path-param contract_ids parse, operators map to
spaces, and index/query normalization is identical.
@coseto6125 coseto6125 enabled auto-merge (squash) June 22, 2026 21:06
@coseto6125 coseto6125 added the merge-queue Opt-in to Mergify merge queue label Jun 22, 2026
@github-actions

Copy link
Copy Markdown
Contributor
ecp impact cache (0 symbols) — internal, used by ecp dev pr-analyze

[]

@github-actions github-actions Bot added the ecp:risk-low ecp signal label Jun 22, 2026
@coseto6125 coseto6125 merged commit 62a1aee into main Jun 22, 2026
18 checks passed
@coseto6125 coseto6125 deleted the perf/compact-str-value branch June 22, 2026 21:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ecp:risk-low ecp signal merge-queue Opt-in to Mergify merge queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant