Add db_stress liveness profile and fix lost flush request#14830
Add db_stress liveness profile and fix lost flush request#14830xingbowang wants to merge 6 commits into
Conversation
A flush request can be capped at an older memtable while newer immutable memtables are added. PickMemtablesToFlush() could pick only part of the pending list and clear flush_requested_, causing the newer queued flush request to be skipped and leaving an immutable memtable unflushed. Keep flush_requested_ until all unstarted immutable memtables have been picked, and add unit and DB-level regressions for the lost-request case.
|
| Check | Count |
|---|---|
cppcoreguidelines-pro-type-member-init |
1 |
cppcoreguidelines-special-member-functions |
1 |
| Total | 2 |
Details
db/blob/blob_gen2_format.cc (1 warning(s))
db/blob/blob_gen2_format.cc:90:3: warning: uninitialized record type: 'trailer' [cppcoreguidelines-pro-type-member-init]
db_stress_tool/db_stress_test_base.cc (1 warning(s))
db_stress_tool/db_stress_test_base.cc:61:7: warning: class 'ScopedThreadOperation' defines a non-default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator [cppcoreguidelines-special-member-functions]
Codex Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.🟡 Codex Code ReviewAuto-triggered after CI passed — reviewing commit 00526b6 ❌ Codex review failed before producing findings. ℹ️ About this responseGenerated by Codex CLI. Limitations:
Commands:
|
Claude Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.✅ Claude Code ReviewAuto-triggered after CI passed — reviewing commit 00526b6 SummaryThis PR adds an opt-in liveness watchdog to High-severity findings (0): No high-severity findings. Full review (click to expand)Findings🔴 HIGHNo high-severity findings. 🟡 MEDIUMM1. Torn read in watchdog could produce false timeout on
|
|
@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D107895052. |
Codex Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.🟡 Codex Code ReviewAuto-triggered after CI passed — reviewing commit e4d90c4 ❌ Codex review failed before producing findings. ℹ️ About this responseGenerated by Codex CLI. Limitations:
Commands:
|
Claude Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.✅ Claude Code ReviewAuto-triggered after CI passed — reviewing commit e4d90c4 SummaryThis PR fixes a real and well-diagnosed flush scheduling bug where High-severity findings (0): No high-severity findings. Full review (click to expand)Findings🔴 HIGHNone. 🟡 MEDIUMM1. Integration test uses sleep-based polling --
|
| Context | Behavior Changed? | Analysis |
|---|---|---|
Atomic flush (atomic_flush_seqno_ != kMaxSequenceNumber) |
No | Old: if (false || num_flush_not_started_ == 0). New: same condition. Identical. |
| Non-atomic flush | Yes (fix) | Old: always cleared flush_requested_. New: only clears when num_flush_not_started_ == 0. This is the bug fix. |
| MemPurge | Compatible | BackgroundFlush:3730 calls FlushRequested() for mempurge CFs, which re-sets the flag. The fix doesn't interfere. |
| RollbackMemtableFlush | Compatible | Rollback increments num_flush_not_started_. With fix, flush_requested_ may already be true, ensuring re-scheduling. |
| WritePreparedTxnDB | No interaction | flush_requested_ is a scheduling flag, not a visibility control. |
| ReadOnly DB | No interaction | No flushes scheduled. |
| User-defined timestamps | Compatible | UDT reschedule logic (line 3692) checks separately. |
Busy-Loop Analysis
Claim: "flush_requested_ staying true won't cause busy-looping"
Verified correct:
PickMemtablesToFlushonly picks memtables with!flush_in_progress_. Each pick decrementsnum_flush_not_started_.- When
num_flush_not_started_ == 0,flush_requested_is cleared andimm_flush_neededis set to false. EnqueuePendingFlushfor non-atomic flush checks!cfd->queued_for_flush(), preventing duplicate enqueue while a flush is in-flight.- After flush completes,
PopFirstFromFlushQueueclearsqueued_for_flush. ThenMaybeScheduleFlushOrCompactionprocesses the queue. - The remaining unstarted memtable will be picked up when a new flush is enqueued (e.g., by the write path or another explicit flush request that sees
IsFlushPending() == true).
No infinite loop is possible because each flush iteration makes progress (picks at least one memtable or finds nothing to pick when all are in-progress/done).
Re-scheduling Path for Remaining Memtable
After the first partial flush completes:
queued_for_flushis cleared for the CF (line 3466)flush_requested_is still true (the fix)- Any subsequent call to
EnqueuePendingFlushfor this CF will succeed because!queued_for_flush()andIsFlushPending()both return appropriate values - The remaining memtable gets flushed when the next flush trigger fires (write path switch, explicit flush, or another
GetLiveFilesStorageInfocall)
Positive Observations
-
Well-diagnosed bug: The PR clearly explains the race condition with a concrete reproduction scenario (IngestExternalFile + GetLiveFilesStorageInfo).
-
Minimal flush fix: The fix is a 3-line deletion + 5-line comment change. The minimal diff reduces risk.
-
Both unit and integration tests: The flush fix has both a direct
MemTableListunit test (FlushRequestPersistsAfterPartialPick) and a DB-level integration test (IngestFlushRequestNotLostBehindQueuedFlush). -
Liveness watchdog is well-gated: Zero overhead when disabled (default). The
LivenessTrackingEnabled()check prevents any performance impact on normal stress test runs. -
RAII operation tracking:
ScopedThreadOperationensures operations are always popped, preventing stuck-operation false positives from code that returns early. -
Abort/resume compaction hardening: The new gating logic (serialization via CAS, compaction count tracking, write stall avoidance) addresses real issues where aggressive random abort/resume could starve compactions and cause secondary write stalls.
-
Comprehensive Python test coverage: The
db_crashtest_test.pyadditions thoroughly test parameter generation, command flag passing, timeout handling, and mode composability.
ℹ️ About this response
Generated by Claude Code.
Review methodology: claude_md/code_review.md
Limitations:
- Claude may miss context from files not in the diff
- Large PRs may be truncated
- Always apply human judgment to AI suggestions
Commands:
/claude-review [context]— Request a code review/claude-query <question>— Ask about the PR or codebase
…est_liveness_bug # Conflicts: # db_stress_tool/db_stress_test_base.cc # tools/db_crashtest.py # tools/db_crashtest_test.py
|
@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D107895052. |
|
@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D107895052. |
🟡 Codex Code ReviewAuto-triggered after CI passed — reviewing commit 5b27079 ❌ Codex review failed before producing findings. ℹ️ About this responseGenerated by Codex CLI. Limitations:
Commands:
|
✅ Claude Code ReviewAuto-triggered after CI passed — reviewing commit 5b27079 SummaryThis PR has two parts: (1) a correctness fix for a lost flush request bug in High-severity findings (0): Full review (click to expand)Findings🔴 HIGHNone. 🟡 MEDIUMM1.
|
| Context | Does code execute? | Assumptions hold? | Action needed? |
|---|---|---|---|
| Atomic flush | YES | YES -- the fix makes flush_requested_ clearing stricter, which is safe for atomic flush |
None |
| Non-atomic flush | YES | YES -- this was the buggy path; now fixed | None |
| MemPurge | YES | YES -- mempurge adds memtables with older IDs; num_flush_not_started_ tracking is independent of ID ordering |
None |
| WritePreparedTxnDB | N/A (flush scheduling is orthogonal) | N/A | None |
| ReadOnly DB | N/A (no flush) | N/A | None |
Flush fix assumption stress test:
- Claim: "flush_requested_ should persist until all unstarted memtables are picked"
- Precondition 1:
num_flush_not_started_accurately tracks unstarted memtables. Verified: incremented inAdd(), decremented inPickMemtablesToFlush()andRollbackMemtableFlush(), with assertions guarding consistency. - Precondition 2: PickMemtablesToFlush will eventually be called with a high enough max_memtable_id. Verified:
IsFlushPending()returns true whileflush_requested_ && num_flush_not_started_ > 0, causingSchedulePendingFlushto enqueue work, which callsFlushJob::PickMemTable(). - No counterexample found where
flush_requested_would be stuck indefinitely -- the feedback loop throughIsFlushPending()-> scheduling -> picking ensures progress.
Positive Observations
- The flush fix is minimal and surgical -- only the condition for clearing
flush_requested_changed, with clear comments explaining the rationale. - The new
FlushRequestPersistsAfterPartialPickunit test directly covers the bug scenario with clean setup and assertions. - The
IngestFlushRequestNotLostBehindQueuedFlushintegration test is thorough with proper timeout guards and cleanup paths to avoid hanging CI. - The liveness watchdog uses a stack-based operation tracking design that handles nested operations gracefully.
- The
TryBeginAbortAndResumeCompactionsgating pattern (CAS + compaction progress check + write stall awareness) is a well-designed solution to prevent compaction starvation from aggressive abort/resume cycling. - The Python test infrastructure changes are well-tested with dedicated unit tests for parameter layering, timeout handling, and fault injection override behavior.
- Memory ordering on atomics is appropriate throughout --
relaxedfor counters,release/acquirefor operation stack depth and type.
ℹ️ About this response
Generated by Claude Code.
Review methodology: claude_md/code_review.md
Limitations:
- Claude may miss context from files not in the diff
- Large PRs may be truncated
- Always apply human judgment to AI suggestions
Commands:
/claude-review [context]— Request a code review/claude-query <question>— Ask about the PR or codebase
|
@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D107895052. |
Summary
Add a db_crashtest liveness mode that runs db_stress with a watchdog for
foreground progress. The profile disables fault injection and uses longer runs
so failures are more likely to represent real forward-progress bugs instead of
expected injected I/O errors. The watchdog tracks completed operations, active
operation type/start time, and per-operation completion counts to distinguish
stuck writes from long admin operations.
Bug: The new mode exposed a latent flush scheduling bug. A flush request can be queued
with an effective max memtable id, then a newer immutable memtable can be added
before the background flush picks work. PickMemtablesToFlush() could pick only
the older memtables and still clear flush_requested_. A later queued flush for
the newer memtable would then be skipped because the column family no longer
looked flush-pending, leaving the immutable memtable permanently unflushed. In
stress this surfaced as IngestExternalFile() waiting in WaitForFlushMemTables(),
with writes eventually blocked behind the same unflushed memtable.
Fix: Keep flush_requested_ set until all unstarted immutable memtables have been
picked. Add a direct MemTableList regression for the partial-pick case and a
DB-level regression covering the ingest/get-live-files ordering that reproduced
the lost request.
Testing