Skip to content

fix: filtered-DELETE staging livelock on novelty-heavy ledgers#1431

Open
bplatz wants to merge 4 commits into
fix/fulltext-persisted-arena-lookupfrom
fix/filtered-delete-staging-hang
Open

fix: filtered-DELETE staging livelock on novelty-heavy ledgers#1431
bplatz wants to merge 4 commits into
fix/fulltext-persisted-arena-lookupfrom
fix/filtered-delete-staging-hang

Conversation

@bplatz

@bplatz bplatz commented Jul 4, 2026

Copy link
Copy Markdown
Contributor

Problem

A production transactor (10 GB / ~6 vCPU) hit a two-pattern where + delete update that never completed staging — killed at the 900 s timeout on every attempt. The transaction retracted all triples of 92 tagged subjects (~21.5k flakes, 14 of them carrying 1,536-entry @list embedding vectors), on a ledger where all matched data lived in novelty (a delete-everything commit followed by re-inserts, no index rebuild between). The identical WHERE as a read-only SELECT returned 20,234 rows in ~2 s, and an unfiltered delete-everything on the same ledger committed in seconds — the pathology was specific to filtered-delete staging.

Root cause

Two compounding costs:

  1. Per-retraction point lookups. hydrate_list_index_meta_for_retractions issued one range_with_overlay lookup per generated retraction to copy @list index metadata (FlakeMeta.i) from the asserted flake — ~21.5k serial lookups for this transaction.
  2. Per-call full-novelty translation. Every range_with_overlay call on the V3 provider walked the graph's entire novelty unbounded, dict-translated every overlay flake, then sorted and lifecycle-resolved the whole op set — per call.

Net cost: O(matched_triples × novelty·log novelty) — a CPU livelock, exactly why more wall-clock never helped. The SELECT was fast because scans pay the translation once per operator; the delete-everything control ran when novelty was still small.

Fix

fluree-db-transact — group hydration candidates by (graph, subject, predicate) and issue one lookup per group, matching object values in memory. Cost now scales with distinct (subject, predicate) pairs instead of matched triples. Retraction semantics are unchanged: each retraction copies the first dt-compatible asserted list meta, so a value asserted at multiple list positions still loses exactly one entry per distinct WHERE binding (pinned by it_join_batched_overlay::batched_object_join_merges_novelty against reindexed ground truth).

fluree-db-query / fluree-db-core / fluree-db-novelty — the systemic half:

  • New OverlayProvider::content_version() hook: a globally-unique content stamp. Novelty refreshes it from a process-wide counter on every mutation, so no two novelty states with different content ever share a version — across instances, clones, and ledgers (per-instance epoch values collide across divergent clones; a unit test pins this). Wrapper overlays (staged, historical, reasoner, combined) default to None and keep translating fresh.
  • A small cross-call LRU in the V3 range provider serves unfiltered overlay translations keyed on (store_id, index_t, content_version, to_t, g_id, index).
  • Range-bounded cursors now receive only the overlay-op window intersecting their key range (overlay_window_for_range, same pattern as BinaryScanOperator::open), so a cache hit costs O(log novelty + window) instead of an O(novelty) merge walk per lookup.

This also removes the same quadratic exposure from other per-flake lookup loops (policy class lookups, upsert deletions, annotation cascades).

Results

  • The reported transaction shape goes from >900 s (never completing) to staging in seconds.
  • BSBM Update: a little over 2× throughput on the Update + Query mix benchmark.

Testing

  • New regression tests mirroring the reported topology: filtered two-pattern delete over @list-carrying subjects in delete-all + re-insert novelty, both pure-novelty and stacked on a published binary index (the indexed variant also exercises cache invalidation across commits).
  • Novelty unit test pinning content_version uniqueness across divergent clones and refresh on clear_up_to.
  • Full CI parity locally: fmt, clippy (all features/targets, -D warnings), nextest --workspace --all-features — 8569/8573 passed; the 4 failures are two known flakes (LocalStack port mapping, raft liveness demote) that pass in isolation, and two pre-existing follow_owl_imports failures traced to the base branch and fixed there separately (cc73f76db on feature/rdfs-enforcement-entailment).

bplatz added 4 commits July 4, 2026 10:07
…taging

A where+delete update matching N triples paid one range_with_overlay
point lookup per generated retraction to hydrate @list index metadata.
Every such call re-translates and re-sorts the graph's entire novelty
overlay, so filtered-delete staging cost O(N x novelty log novelty) --
observed as a >900s livelock deleting ~21k triples (92 subjects with
1536-entry list vectors) on a novelty-heavy ledger, while the identical
SELECT returned in 2s.

Group hydration candidates by (graph, subject, predicate) and issue one
lookup per group, matching object values in memory. Cost now scales
with distinct (subject, predicate) pairs instead of matched triples.

Retraction semantics are unchanged: each retraction copies the first
dt-compatible asserted list meta, mirroring the per-flake lookup's
.find() -- a value asserted at multiple list positions still loses
exactly one entry per distinct WHERE binding, as pinned by the
object-probe-list-retract case in it_join_batched_overlay.rs.
…calls

Every range_with_overlay call on the V3 provider walked the graph's
entire novelty, dict-translated every overlay flake, and re-sorted the
op set -- per call. Point-lookup loops (staging list-meta hydration,
policy class lookups, upsert deletions, annotation cascades) therefore
cost O(calls x novelty log novelty) on novelty-heavy ledgers; combined
with per-retraction hydration this livelocked a filtered DELETE for
>900s while the identical SELECT ran in 2s.

Serve unfiltered translations from a small cross-call LRU, keyed on
(store_id, index_t, overlay content version, to_t, graph, index), and
give range-bounded cursors only the overlay-op window intersecting
their key range (same pattern as BinaryScanOperator::open) so a cache
hit costs O(log novelty + window) instead of an O(novelty) merge walk.

Identity comes from a new OverlayProvider::content_version hook: a
globally-unique stamp refreshed from a process-wide counter on every
Novelty mutation, so no two novelty states with different content ever
share a cache key -- across instances, clones, and ledgers (per-instance
epochs collide across divergent clones). Overlays that cannot vouch for
such a stamp (staged, historical, reasoner, combined) return None and
keep translating fresh, preserving their existing behavior; the
predicate-filtered translation form also stays uncached because the
allow-list changes both the translated and raw-fallback sets.
@bplatz bplatz requested review from aaj3f and zonotope July 4, 2026 14:19

@aaj3f aaj3f left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No notes -- looks good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants