Skip to content

fix: fulltext() scoring on configured predicates with novelty present, plus fulltext and vector scoring perf#1430

Merged
bplatz merged 17 commits into
mainfrom
fix/fulltext-persisted-arena-lookup
Jul 5, 2026
Merged

fix: fulltext() scoring on configured predicates with novelty present, plus fulltext and vector scoring perf#1430
bplatz merged 17 commits into
mainfrom
fix/fulltext-persisted-arena-lookup

Conversation

@bplatz

@bplatz bplatz commented Jul 4, 2026

Copy link
Copy Markdown
Contributor

Problem

Field reports from Solo (Lambda + S3, ledger with ~212k configured rdfs:label values) showed fulltext(?v, "…") returning unbound for every row on an f:fullTextDefaults-configured predicate — indexed or not, lang-tagged or plain — even though the BM25 arena built correctly. A follow-up report measured ~27s wall time per query once scoring worked. While in the area, the inline (flat-rank) vector scoring path was audited for the same per-row inefficiencies.

Root cause (fulltext correctness)

BinaryScanOperator only uses late materialization (Binding::EncodedLit) when the overlay epoch is 0. The moment the ledger has any unindexed commit, every scanned row is eagerly materialized into Binding::Lit — and the Lit arm of eval_fulltext only scored @fulltext-datatype values, returning unbound for configured plain/lang-tagged strings. So inline full-text search worked only while a ledger was exactly caught up to its index and silently went dark one commit later. The existing memory-only reproducers always query at exactly index_t (empty novelty), which is why they kept passing while deployments with separate indexing/query processes kept regressing.

Fix: the Lit arm now mirrors the EncodedLit arm — resolve the arena bucket from the row's lang tag (dict-assigned lang_id; English bucket for untagged/@fulltext values) and score via unified BM25 with the novelty delta. No-arena semantics are unchanged: non-@fulltext strings stay unbound, @fulltext values keep the TF-saturation fallback. ExecutableQuery::uses_fulltext also now covers expression positions outside WHERE (ORDER BY binds, post-aggregation binds, HAVING) so the provider map is attached for those shapes.

Root cause (fulltext performance)

eval_fulltext runs once per bound row and repeated query-invariant work every time:

  • Analyzer::for_language clones the language's entire stopword set per call — and was called per row
  • the query string was re-tokenized and re-stemmed per row
  • effective stats / df / IDF were recomputed per row, with two binary searches over the ~60k-string term dictionary per term (once for df, once for tf)
  • materialized Lit rows re-analyzed the document text through the novelty path even when the doc was already in the arena

Fix:

  • Analyzer::shared(lang) — process-wide per-language analyzers, built once
  • PreparedFulltextQuery — bucket analyzer choice, query stems, arena term ids, IDF weights, and effective corpus stats computed once per (store_id, epoch, to_t, g_id, p_id, lang_id, query), moka-cached with a thread-local one-slot memo in front. store_id in the key pins prepared term ids to the arena instance they index into. Per-row scoring is a doc_bow probe plus one u32 binary search per query term. Scores are unchanged (same formula, same term order).
  • Materialized-Lit indexed fast path: resolve the value's string_id (persisted dict, then dict novelty) and score from the precomputed arena BoW; only genuinely unindexed values analyze text

Measured (fulltext)

50k configured labels, file-backed, one unindexed commit to force the materialized-row path (the production shape), release build:

per query per row
before 1.45 s ~29 µs
after 61 ms ~1.2 µs

Extrapolated to the field's 211k labels, the ~27s query should drop to ~1–1.5s, dominated by the base scan itself. The next lever, if ever needed, is posting-driven retrieval (WAND top-k pushdown, as the graph-source BM25 already does) — a planner feature, deliberately out of scope here.

Vector (flat-rank) scoring

The inline vector functions (dotProduct, cosineSimilarity, euclideanDistance) had the same class of per-row waste, audited and fixed in two commits:

Zero-copy eval wired to SIMD — both arguments were resolved through ComparableValue, which allocs and copies the full vector per row (including the query vector, which never changes: ~12 KB of memcpy per row on both sides for a 1536-dim embedding). The math used plain iterator closures — cosine made three separate passes — while the runtime-dispatched SIMD kernels in eval::vector_math had no production callers at all. Now: arguments resolve to borrowed &[f64] slices straight from the binding or the expression constant, compute routes through the AVX/SSE2/NEON-dispatched kernels, and cosine_f64/f32 use three dispatched dot passes above the SIMD length threshold instead of one scalar fused pass.

Arc-backed FlakeValue::Vector — the core type held Vec<f64>, so every FlakeValue clone through flake → binding → batch → output (and through transact/novelty merges on vector-heavy ledgers) deep-copied the whole embedding. Now Arc<[f64]>: clones are refcount bumps everywhere. Serde stays a plain number array via a local adapter (wire form identical, covered by a round-trip test); FlakeValueComparableValue vector conversions become free Arc clones. 21 files, almost entirely mechanical.

Zero-copy f32 shard reads for indexed vectors — indexed vector arguments (EncodedLit VECTOR_ID) reached eval through decode_value, which collects a fresh Vec<f64> + Arc per row even though the arena already hands out a zero-copy, shard-pinned f32 slice. Eval now borrows the packed f32 data via a new BinaryIndexStore::vector_slice accessor and widens into a reused thread-local scratch — same widened values, same f64 kernels, identical scores. Arena misses (ephemeral novelty handles) fall back to the decode path.

Measured (vector, honest numbers)

50k rows × 256-dim, release, results verified identical against scalar recomputation. Novelty-resident (materialized-row path):

function baseline after
dotProduct 163 ms ~137 ms
cosineSimilarity 205 ms ~139 ms
euclideanDistance 167 ms ~134 ms

Indexed (vector arena, late-materialized rows), same query and verification:

baseline zero-copy shard reads + top-k fast path
dotProduct, warm 27.0 ms ~21 ms ~3.3 ms (8×)

Novelty-resident gains are modest because that path's floor is per-row operator machinery (scan, binding construction, join, sort, format), not scoring or copies — measured directly by a 1024-dim run where the delta stays ~6%. The Arc change's primary value is the O(1)-clone guarantee for embeddings across transact/novelty/query and removing the foot-gun where any binding clone silently paid O(dims).

Flat-rank top-k fast path — the step-change. Detects the canonical similarity shape (single triple over a vector predicate + bind ?score dotProduct(?vec, target) + optional threshold / ORDER BY DESC(?score) / LIMIT, projecting {?s, ?score}) and executes it as a direct scan of the packed f32 shards instead of the operator pipeline: shards pinned once (VectorArenaSnapshot, zero per-row cache probes), rows from the shared overlay-merging cursor (novelty asserts appear, retractions cancel — same lane as the count fast paths), subject-range partitioned onto the global rayon pool with per-partition bounded top-k heaps, deterministic tie-break. Scores widen f32 → f64 through the same SIMD kernel as eval, so results are bit-identical to the pipeline; any uncertainty (non-vector rows, missing arena with novelty, dims mismatch, policy/multi-ledger/time-travel) bails to the generic plan. A dedicated parity test verifies indexed top-k, the overlay lane (novelty assert + retraction via upsert), multi-cardinality subjects, and fallback shapes against scalar ground truth.

Notable finding along the way: the indexed pipeline runs the identical flat-rank query 6.5× faster than novelty-resident (21 ms vs 137 ms per 50k rows) — keeping vector corpora indexed matters more than any eval tuning. Novelty-resident (arena-less) ledgers keep the pipeline path unchanged.

Test coverage

The fulltext suite was memory-only. New persisted-backend tests build in one Fluree instance, drop it, and query from a fresh instance over the same file storage (mirroring the split indexing/query deployment):

  • full reindex + fresh-process reload scores untagged and @en probes
  • incremental build fetches and extends the persisted arena; old and new values score
  • unindexed novelty commit scores and indexed values keep scoring with novelty present (the failing reproducer for the correctness bug)
  • language-tagged values build per-language arena buckets (French stemming, cross-bucket isolation) that round-trip the FIR6 root
  • fulltext_perf_50k_labels and vector_flatrank_perf_50k — ignored perf smoke harnesses, run manually with --ignored --nocapture; the vector harness verifies hit counts, top-5 ranks, and scores against scalar recomputation of the same generated vectors, across both a novelty-resident and a fully-indexed phase
  • vector_topk_fast_path_parity — scalar-ground-truth parity for the top-k fast path across its lanes (indexed base, overlay assert + retraction, multi-cardinality, fallback shapes)
  • FlakeValue::Vector serde round-trip test (untagged enum + with adapter → plain number array)

Validation: full-workspace clippy (all features, all targets) clean; query-lib (1168), core (693), binary-index (331), transact (274), grp_query (303), and flatrank (16) suites pass.

@bplatz bplatz requested review from aaj3f and zonotope July 4, 2026 00:56
@bplatz bplatz changed the title fix: fulltext() scoring on configured predicates with novelty present, plus 24x per-row eval speedup fix: fulltext() scoring on configured predicates with novelty present, plus fulltext and vector scoring perf Jul 4, 2026
@bplatz bplatz force-pushed the feature/event-time-commits branch from defbd6b to 4d4430c Compare July 4, 2026 14:07
bplatz added 14 commits July 4, 2026 10:07
…:Lit rows

Whenever a ledger carries any novelty, BinaryScanOperator disables late
materialization (binary_scan.rs overlay-epoch gate), so every scanned
value reaches eval_fulltext as Binding::Lit rather than EncodedLit. The
Lit arm only scored @fulltext-datatype values and returned unbound for
everything else — so fulltext() on f:fullTextDefaults-configured
predicates went null for EVERY row (indexed or not) the moment one
unindexed commit existed. A ledger queried exactly at index_t worked,
which is why the memory-only reproducers kept passing while S3/Lambda
deployments (indexing and querying in separate processes, rarely fully
caught up) kept regressing.

Generalize the Lit arm to mirror the EncodedLit arm: resolve the arena
bucket from the row's lang tag (dict-assigned lang_id, English bucket
for untagged/@fulltext values), analyze with the bucket's analyzer, and
score via unified BM25 with the novelty delta. No-arena semantics are
unchanged: non-@fulltext strings stay unbound, @fulltext values keep
the TF-saturation fallback.

Also extend ExecutableQuery::uses_fulltext to expression positions
outside WHERE (ORDER BY binds, post-aggregation binds, HAVING) so the
provider map is attached for those shapes too.

New persisted-backend coverage (file storage, fresh process per phase,
mirroring the two-Lambda deployment):
- full reindex + reload scores untagged and @en probes
- incremental build fetches and extends the persisted arena
- unindexed novelty commit scores; indexed values keep scoring (the
  failing reproducer for this bug)
- language-tagged values build per-language buckets (French stemming,
  cross-bucket isolation) that round-trip the FIR6 root
… row

fulltext() evaluated per bound row and repeated query-invariant work
every time: Analyzer::for_language clones the language's entire stopword
set per call, the query string was re-tokenized and re-stemmed per row,
and effective stats / df / IDF were recomputed per row with two binary
searches over the ~60k-string term dictionary per term (once for df,
once for tf). On materialized Binding::Lit rows — the norm whenever any
novelty exists — the document text was also re-analyzed per row through
the novelty path even when the doc was already in the arena. At 211k
values that added up to ~27s per query in the field.

- Analyzer::shared(lang): process-wide per-language analyzers, built
  once and leaked (bounded by the Language enum); for_language stays
  for build-time callers that want an owned instance.
- PreparedFulltextQuery: bucket analyzer choice, query stems, arena
  term ids, IDF weights, and effective corpus stats computed once per
  (store_id, epoch, to_t, g_id, p_id, lang_id, query), moka-cached with
  a thread-local one-slot memo in front. store_id in the key pins
  prepared term ids to the arena instance they index into. Per-row
  scoring is now a doc_bow probe plus one u32 binary search per term,
  replacing score_bm25_indexed/score_bm25_novelty (same formula, same
  term order — scores are unchanged).
- Binding::Lit indexed fast path: resolve the value's string_id
  (persisted dict, then dict novelty) and score from the precomputed
  arena BoW; only genuinely unindexed values analyze document text.

Perf smoke harness (ignored test, run manually): 50k configured labels,
full reindex, one unindexed commit to force the materialized-row path.
Before: 1.45s per query; after: 61ms (~24x) in release on M-series.
The inline vector functions (dotProduct, cosineSimilarity,
euclideanDistance) resolved both arguments through ComparableValue,
which allocs and copies the full vector per row — including the query
vector, which never changes across rows (~12 KB memcpy per row for a
1536-dim embedding on both sides). The math itself used plain iterator
closures: cosine made three separate passes over both vectors, and the
runtime-dispatched SIMD kernels in eval::vector_math had no production
callers at all (only a bench imported them).

- eval_binary_vector_fn resolves arguments to borrowed &[f64] slices
  straight from the row's Binding::Lit / the expression's constant
  (decoding EncodedLit once into an owned scratch), skipping the
  ComparableValue round-trip and both per-row copies.
- Compute routes through vector_math::dot_f64 / cosine_f64 / l2_f64
  (runtime AVX/SSE2/NEON dispatch above the 256-length threshold).
- cosine_f64/f32 now use three dispatched dot-kernel passes above the
  SIMD threshold instead of the single scalar fused pass (below the
  threshold the fused scalar loop still wins).

Perf smoke harness (ignored test, run manually): 50k novelty rows of
256-dim vectors, results verified against scalar recomputation of the
same generated vectors (exact hit counts, top-5 ranks, scores within
1e-6). End-to-end gains are modest because the scan/materialize floor
(~145 ms for 50k rows) dominates flat ranking, not the scoring eval:
dotProduct 163->148 ms, cosine 205->151 ms, euclidean 167->149 ms in
release on M-series; at 1024 dims the scan floor swallows nearly all
of it. Larger wins for flat vector search at scale would need Arc-based
vector values through scan/materialization, or HNSW top-k routing.
FlakeValue::Vector held Vec<f64>, so every clone through the flake →
binding → batch → output pipeline deep-copied the whole embedding
(12 KB per clone for a 1536-dim vector), and the same cost hit transact
and novelty merges on vector-heavy ledgers. Switch to Arc<[f64]>:
clones are now a refcount bump everywhere a FlakeValue is cloned.

- Serde stays a plain number array via a local with-adapter (no global
  serde rc opt-in; wire form identical to the previous Vec<f64> —
  covered by a new round-trip test).
- FlakeValue::max() keeps the empty-vector sentinel via a shared static.
- FlakeValue <-> ComparableValue vector conversions become free Arc
  clones instead of buffer copies; VecArg in eval/vector.rs drops its
  Owned variant (EncodedLit decode now hands back the Arc directly).
- Remaining call sites are mechanical: .into() at construction,
  .iter() where loops relied on Vec's IntoIterator.

Measured on the flat-rank harness (release, M-series): ~5-8% at
50k x 256-dim, ~6% at 20k x 1024-dim — the flat-rank floor is per-row
operator machinery, not vector copies. The primary value is the O(1)
clone guarantee for embeddings across transact/novelty/query paths and
removing the standing foot-gun where any binding clone silently paid
O(dims).
Indexed vector arguments (EncodedLit VECTOR_ID) reached eval through
decode_value, which collects a fresh Vec<f64> and converts it to an
Arc per row — two allocations plus a full widening copy for every
scored row, even though LazyVectorArena::lookup_vector already returns
a zero-copy, shard-pinned f32 slice.

- BinaryIndexStore::vector_slice(g_id, p_id, handle): public zero-copy
  lookup of the packed shard data.
- eval resolves EncodedLit VECTOR_ID args to the borrowed f32 slice and
  widens into a reused thread-local scratch at compute time — same
  widened values, same f64 kernels, so scores are unchanged. Arena
  misses (ephemeral novelty handles, no arena for the predicate) fall
  back to the decode path.
- Perf harness gains an indexed phase: full reindex + reload with empty
  novelty so rows late-materialize and eval sees EncodedLit, verified
  against the same scalar ground truth.

Measured (50k x 256-dim, release, M-series): indexed warm run
27.0 -> 21.0 ms (~22%; the eliminated per-row allocs). Novelty-resident
timing unchanged as expected — that path materializes values in the
scan, not eval. Incidental finding worth recording: the indexed
pipeline runs the identical query 6.5x faster than novelty-resident
(21 ms vs 137 ms per 50k rows), so keeping vector corpora indexed
dominates any eval-side tuning.
Detects the canonical vector-similarity shape — single triple over a
vector predicate, bind ?score dotProduct(?vec, <const|single-row VALUES
target>), optional threshold filter, optional ORDER BY DESC(?score),
optional LIMIT/OFFSET, projection over {?s, ?score} — and executes it
as a direct scan of the vector arena instead of driving every row
through the scan → bind → filter → sort pipeline.

Executor (fast_vector_topk.rs):
- rows come from the shared overlay-merging cursor (novelty asserts
  appear, retracted base rows cancel — the same merge lane the count
  fast paths use); novelty vector values (ephemeral handles) are
  pre-decoded once before the scan
- VectorArenaSnapshot pins every shard once for zero-probe reads;
  scoring widens f32 shard data to f64 and uses the same SIMD dot
  kernel as eval, so scores are bit-identical to the pipeline
- large base scans partition by subject ranges at leaf/leaflet
  boundaries onto the global rayon pool, each partition with its
  subject-sliced overlay ops and a bounded top-k heap; deterministic
  tie-break (score desc, then s_id)
- bails to the generic pipeline on any uncertainty: non-vector rows on
  the predicate, missing arena with novelty present, dims mismatch,
  unresolvable overlay ops, policy / multi-ledger / time-travel

Parity test (vector_topk_fast_path_parity) verifies against scalar
ground truth across lanes: indexed top-k, overlay with a novelty assert
plus a retraction via upsert, multi-cardinality subjects, and fallback
shapes (?vec projected) staying correct via the pipeline.

Measured (50k x 256-dim indexed, release, M-series): dotProduct
threshold + ORDER BY query 21.0 ms -> 3.3 ms (6.4x; 8x vs the
pre-zero-copy 27 ms), identical results. Novelty-resident ledgers
(no arena) keep the pipeline path unchanged.
The vector and fulltext scaling guidance predated the flat-rank
performance work (prepared fulltext scoring, zero-copy SIMD vector
eval, the vector top-k fast path) and treated all query shapes alike.

vector-search.md:
- Document the flat-rank top-k fast path: the canonical signature it
  serves, exact/strongly-consistent semantics, bail conditions, and
  fresh 50k x 768-dim measurements (novelty 324 ms / pipeline 30 ms /
  fast path 8.1 ms, ~6M vec/s).
- Split the HNSW rule-of-thumb table into two columns: top-k-signature
  queries stay interactive to ~1M vectors (and viable to ~10M
  memory-resident), while general-shape queries keep roughly the old
  thresholds. Call out the consistency trade (HNSW graph source is
  eventually consistent; flat scan is exact and read-your-writes).
- Replace the aspirational normalized-embedding bullet with actionable
  guidance: prefer dotProduct for unit-normalized embeddings (identical
  ranking to cosine, cheaper, and fast-path-eligible).

fulltext.md:
- Note that indexed-path numbers now hold with unindexed commits
  present (prepared per-query scoring + BoW lookup on materialized
  rows) — previously the with-novelty path was drastically slower.
- Quantify the 500K-1M tier (~1-2 us/doc, no top-k pruning inline) and
  correct the WAND mention (graph-source feature, not inline); add the
  consistency trade note.

README.md: qualified flat-scan vs HNSW summary to match.

Harness: vector_flatrank_perf_50k now also times the general-pipeline
lane (duplicate threshold filter defeats fast-path detection) so both
columns of the guidance stay measurable in one run.
The docs led with the @fulltext datatype and framed the #config-graph
approach as the advanced option. That inverts the actual guidance: the
config path is the standards-conformant one — searchable values keep
their xsd:string / rdf:langString datatypes (external consumers,
validation, federation all see ordinary literals) and language-tagged
values get per-language analysis. Tagging a value @fulltext replaces
its stored datatype with the Fluree-specific f:fullText, which the
docs never surfaced as a trade-off.

- fulltext.md: intro recommends f:fullTextDefaults first and states the
  datatype-replacement trade-off plainly; @fulltext is positioned as a
  quick-start for siloed databases or properties orthogonal to the core
  data model. 'Why a dedicated datatype?' and Portability now cover the
  external-RDF caveat; the when-to-use table leads with the config path.
- indexing-and-search README: entry-point ordering and decision list
  updated to match.
- cookbook-search.md: quick start keeps the self-contained @fulltext
  example but points standards-focused readers at f:fullTextDefaults
  up front (queries are identical either way).
- concepts/datatypes.md: @fulltext entry notes the stored-datatype
  consequence and the config alternative.
@bplatz bplatz force-pushed the fix/fulltext-persisted-arena-lookup branch from eab70ae to a2a8b4d Compare July 4, 2026 14:07

@aaj3f aaj3f left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good fix!

Comment on lines +536 to +538
if order_desc || limit.is_some() {
rows.sort_unstable_by(|a, b| b.1.total_cmp(&a.1).then_with(|| a.0.cmp(&b.0)));
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a detected query with LIMIT but no ORDER BY, the fast path still sorts by score-desc (order_desc || limit.is_some()) and returns the top-k by score, whereas the generic pipeline returns k arbitrary rows (scan/subject order). SPARQL leaves bare-LIMIT order unspecified, so this is spec-legal, but it silently changes which rows come back once this optimization triggers, and it breaks the PR's stated "bit-identical to the pipeline" invariant for that shape. Cleanest fix is to decline the shape in detection so it routes to the pipeline unchanged:

// in detect_vector_topk, after computing order_var and before building the spec:
// A bare LIMIT with no ORDER BY means "arbitrary k rows"; imposing top-k-by-score
// here would diverge from the pipeline. Only serve LIMIT alongside DESC(?score).
if query.limit.is_some() && order_var.is_none() {
    return None;
}

(If serving bare-LIMIT is intentional, please add a comment at line 536 stating the divergence is accepted.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 0e1eb15

Comment on lines +728 to +737
for partial in partials {
match partial? {
Some(rows) => {
for (s, score) in rows {
merged.push(s, score);
}
}
None => return Ok(None), // a partition hit a bail condition
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scan_partitioned returns Ok(None) for two semantically different outcomes: "too few partitions, fall back to the serial lane" (lines 647/673/684) and "a partition hit a genuine Bail, defer to the pipeline" (line 735). The caller (lines 526–532) maps every None to "try scan_serial", so a genuine bail triggers a full redundant serial re-scan that will hit the same bail row and then defer. Correct result, wasted work on the (rare) bail path. Distinguish the two, e.g.:

// return type -> Result<Option<Result<Vec<(u64,f64)>, Bail>>> or an enum:
//   Ok(None)            => too few partitions, try serial
//   Ok(Some(Err(Bail))) => a partition bailed, defer straight to pipeline
//   Ok(Some(Ok(rows)))  => rows
None => return Ok(Some(Err(Bail))), // was: return Ok(None)

and have the caller short-circuit to Ok(None) (pipeline) on the Bail variant instead of re-scanning serially.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 09b13d8

Comment on lines +291 to +299
fn new(need: Option<usize>) -> Self {
match need {
Some(n) => Fold::TopK {
need: n,
heap: BinaryHeap::with_capacity(n + 1),
},
None => Fold::All(Vec::new()),
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fold::TopK eagerly pre-allocates BinaryHeap::with_capacity(need + 1) where need = limit.saturating_add(offset) (line 494) is a user literal. The realistic concern (verified): a large-but-legal limit (e.g. LIMIT 10000000 over a few thousand vectors) eagerly allocates one oversized heap per partition (up to MAX_PARTITIONS = 16, plus the merge heap and the serial-lane heap). The need + 1 overflow→debug-panic sub-case is real but pathological — it needs LIMIT ≈ 2^64, and release builds wrap to with_capacity(0) and stay correct. Clamp the pre-allocation to the data:

Some(n) => Fold::TopK {
    need: n,
    // cap the eager reservation; the heap grows if genuinely needed
    heap: BinaryHeap::with_capacity(n.saturating_add(1).min(4096)),
},

(The per-partition need could also be clamped to min(need, partition_row_estimate) in scan_partitioned, but a fixed cap is simplest.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 0e1eb15

Comment on lines +1467 to +1477
/// End-to-end parity for the flat-rank vector fast path against scalar
/// ground truth, across its lanes:
///
/// - indexed base (epoch 0): threshold + ORDER BY DESC + LIMIT top-k
/// - overlay lane: novelty assert (new top vector) + retraction of an
/// indexed vector via upsert — both must be reflected without reindexing
/// - multi-cardinality: a subject with two vectors yields two scored rows
/// - fallback shapes: an extra hydration triple and a projected `?vec`
/// must not detect (results still correct via the pipeline)
#[tokio::test]
async fn vector_topk_fast_path_parity() {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vector_topk_fast_path_parity exercises the detection-time bails (projected ?vec, extra hydration triple) but not the runtime Scorer::scoreErr(Bail) → pipeline-fallback path — i.e. a non-vector object physically stored on the vector predicate with a bare {?s, ?score} projection (which passes detection, then bails mid-scan). vector_search_mixed_datatypes (line 399) has a non-vector value but projects it, so it never reaches the fast path. Add a case: insert a non-vector literal on the vector predicate, run the canonical {?s ?score} shape, and assert results equal the scalar ground truth (proving the runtime bail correctly defers to the generic plan and emits nothing partial).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 0e1eb15

Base automatically changed from feature/event-time-commits to main July 5, 2026 15:15
bplatz added 3 commits July 5, 2026 11:16
…ime bail

Three review-driven fixes to the fast vector top-k path:

- detect_vector_topk: decline a bare LIMIT with no ORDER BY. The scan lane
  sorts by score whenever a limit is present, so without this the fast path
  returned top-k-by-score while the generic pipeline returns k arbitrary
  scan-order rows — spec-legal but a silent divergence that broke the
  "bit-identical to the pipeline" invariant. Now only DESC(?score)+LIMIT is
  served; bare LIMIT routes to the pipeline unchanged.

- Fold::new: clamp the eager BinaryHeap reservation to min(need+1, 4096). `need`
  is a user literal (limit+offset), so a large-but-legal LIMIT would otherwise
  reserve an oversized heap per partition; push already caps growth at need.
  saturating_add also removes the pathological need+1 overflow debug-panic.

- test: add vector_topk_fast_path_runtime_bail_defers_to_pipeline — a non-vector
  literal on the vector predicate passes detection then trips Scorer::score ->
  Err(Bail) mid-scan; asserts the whole fast path defers and returns the
  pipeline's exact rows (no partial output). Complements the detection-time
  bail coverage already present.
…gs in a struct

- scan_partitioned now returns a PartitionScan enum (TooFewPartitions / Bailed /
  Rows) instead of an overloaded Ok(None). The caller short-circuits a genuine
  bail straight to the pipeline instead of re-scanning serially (which would hit
  the same bail row and waste a full pass); "too few partitions" still falls
  back to the serial lane. Correct before, but clearer and no redundant scan.

- Group the six params shared by both scan lanes (ctx, store, g_id, p_id,
  scorer, need) into a VectorScan struct. scan_partitioned drops from 9 args to
  2 and scan_serial from 7 to 2, removing the too_many_arguments allow; to_t /
  epoch are derived from ctx inside the partitioned lane.
@bplatz bplatz merged commit d21ccfc into main Jul 5, 2026
13 checks passed
@bplatz bplatz deleted the fix/fulltext-persisted-arena-lookup branch July 5, 2026 15:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants