Fix crash test Restore() failure with empty trace when replay_write_ops == 0 (#14840)#14840
Fix crash test Restore() failure with empty trace when replay_write_ops == 0 (#14840)#14840anand1976 wants to merge 1 commit into
Conversation
|
@anand1976 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D107968797. |
✅ clang-tidy: No findings on changed linesCompleted in 44.0s. |
…ps == 0 (facebook#14840) Summary: Pull Request resolved: facebook#14840 Differential Revision: D107968797
a4e4ab3 to
88a1016
Compare
Codex Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.🟡 Codex Code ReviewAuto-triggered after CI passed — reviewing commit 88a1016 ❌ Codex review failed before producing findings. ℹ️ About this responseGenerated by Codex CLI. Limitations:
Commands:
|
…ps == 0 (facebook#14840) Summary: `ExpectedState::Restore()` always attempted to open and replay the post-`saved_seqno_` trace. In the crash-test failure, recovery had already returned the DB exactly to `saved_seqno_`, so `replay_write_ops == 0` and the trace was legitimately empty. `FileTraceReader` reported `Status::Incomplete()` on that empty trace, which incorrectly failed restore. Skip the trace replay work when `replay_write_ops == 0`, but still run the normal state promotion, cleanup, and `saved_seqno_` reset path so later saves continue to work. Differential Revision: D107968797
88a1016 to
b00624a
Compare
🟡 Codex Code ReviewAuto-triggered after CI passed — reviewing commit b00624a ❌ Codex review failed before producing findings. ℹ️ About this responseGenerated by Codex CLI. Limitations:
Commands:
|
✅ Claude Code ReviewAuto-triggered after CI passed — reviewing commit b00624a SummaryClean, well-scoped fix for a legitimate crash test failure. The change correctly skips trace replay when High-severity findings (0): No high-severity findings. Full review (click to expand)Findings🟡 MEDIUMM1. Scattered guards could be simplified with early block skip --
|
| Context | Does code execute? | Assumptions hold? | Action needed? |
|---|---|---|---|
| WritePreparedTxnDB | YES (db_stress supports --use_txn) |
YES - recovered prepared txns are committed/rolled back in FinishInitDb() before SaveAtAndAfter() is called |
Safe |
| WriteUnpreparedTxnDB | YES | YES - same resolution path as WritePrepared | Safe |
| ReadOnly DB | NO - db_stress doesn't use ReadOnly for Restore | N/A | N/A |
| Multiple column families | YES | YES - sequence numbers are global | Safe |
| User-defined timestamps | YES (if configured) | YES - timestamps don't affect seqno arithmetic | Safe |
| Sequence number underflow | Guarded by existing check at line 1077-1078 | YES | Pre-existing protection, not affected by this PR |
Assumption stress-test results:
| Claim | Preconditions | Counterexample attempted | Result |
|---|---|---|---|
| "replay_write_ops == 0 means no writes survived" | SaveAtAndAfter() called with DB quiesced (API contract) |
Race condition during SaveAtAndAfter | INVALID - API contract requires no concurrent mutation |
| "Trace file is legitimately empty" | Writes traced after save are lost if DB recovers to saved_seqno_ | WAL replay bringing DB to saved_seqno_ with traced-but-lost writes | SAFE - lost writes shouldn't be replayed against a DB that doesn't have them |
| "2PC transactions create false replay_write_ops == 0" | Prepared txns resolved before SaveAtAndAfter | Prepared txn auto-committed during recovery | INVALID - db_stress resolves all prepared txns in FinishInitDb() before SaveAtAndAfter() |
| "Skipping Prepare() loses validation" | Trace file not used when replay_write_ops == 0 | Corrupted trace file | IRRELEVANT - trace is deleted during cleanup, validation serves no purpose |
Positive Observations
- The fix correctly preserves the entire state-promotion and cleanup flow (CopyFile, RenameFile, Open, DeleteFile, saved_seqno_ reset), which is the most important invariant.
- Short-circuit evaluation in
s.ok() && replay_write_ops > 0 && !handler->IsDone()is idiomatic C++ and prevents null dereference. - The existing
if (handler)null check at line 1195 naturally handles the no-replay case for statistics extraction. - The fix eliminates unnecessary file I/O when no replay is needed.
ℹ️ About this response
Generated by Claude Code.
Review methodology: claude_md/code_review.md
Limitations:
- Claude may miss context from files not in the diff
- Large PRs may be truncated
- Always apply human judgment to AI suggestions
Commands:
/claude-review [context]— Request a code review/claude-query <question>— Ask about the PR or codebase
Summary:
ExpectedState::Restore()always attempted to open and replay the post-saved_seqno_trace. In the crash-test failure, recovery had already returned the DB exactly tosaved_seqno_, soreplay_write_ops == 0and the trace was legitimately empty.FileTraceReaderreportedStatus::Incomplete()on that empty trace, which incorrectly failed restore.Skip the trace replay work when
replay_write_ops == 0, but still run the normal state promotion, cleanup, andsaved_seqno_reset path so later saves continue to work.Differential Revision: D107968797