Skip to content

db_crashtest: re-enable track_and_verify_wals with disabled fault injection (#14860)#14860

Open
hx235 wants to merge 1 commit into
facebook:mainfrom
hx235:export-D108797734
Open

db_crashtest: re-enable track_and_verify_wals with disabled fault injection (#14860)#14860
hx235 wants to merge 1 commit into
facebook:mainfrom
hx235:export-D108797734

Conversation

@hx235

@hx235 hx235 commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Summary:

See title. This is to harvest the coverage of the feature before we fix the tricky compatibility issue (see #13871)

Test plan:
Will only land after intensive rehersal stress test passes

Differential Revision: D108797734

@meta-cla meta-cla Bot added the CLA Signed label Jun 16, 2026
@meta-codesync

meta-codesync Bot commented Jun 16, 2026

Copy link
Copy Markdown

@hx235 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D108797734.

@github-actions

Copy link
Copy Markdown

✅ clang-tidy: No findings on changed lines

Completed in 0.0s.

@meta-codesync meta-codesync Bot changed the title db_crashtest: re-enable track_and_verify_wals with disabled fault injection db_crashtest: re-enable track_and_verify_wals with disabled fault injection (#14860) Jun 16, 2026
hx235 pushed a commit to hx235/rocksdb that referenced this pull request Jun 16, 2026
…ection (facebook#14860)

Summary:
Pull Request resolved: facebook#14860

See title. This is to harvest the coverage of the feature before we fix the tricky compatibility issue (see facebook#13871)

**Test plan:**
Will only land after intensive rehersal stress test passes

Differential Revision: D108797734
@hx235 hx235 force-pushed the export-D108797734 branch from 23b9dfc to 25cef6d Compare June 16, 2026 21:14
…ection (facebook#14860)

Summary:

See title. This is to harvest the coverage of the feature before we fix the tricky compatibility issue (see facebook#13871)

**Test plan:**
Will only land after intensive rehersal stress test passes

Differential Revision: D108797734
@hx235 hx235 force-pushed the export-D108797734 branch from 25cef6d to ca26282 Compare June 16, 2026 21:26
@github-actions

Copy link
Copy Markdown

🟡 Codex Code Review

Auto-triggered after CI passed — reviewing commit ca26282


Codex review failed before producing findings.

WARNING: proceeding, even though we could not create PATH aliases: Refusing to create helper binaries under temporary dir "/tmp" (codex_home: AbsolutePathBuf("/tmp/codex-home"))
error: the argument '--base <BRANCH>' cannot be used with '[PROMPT]'

Usage: codex exec review --commit <SHA> --base <BRANCH> --title <TITLE> --model <MODEL> --config <key=value> --dangerously-bypass-approvals-and-sandbox --output-last-message <FILE> [PROMPT]

For more information, try '--help'.

ℹ️ About this response

Generated by Codex CLI.
Review methodology: claude_md/code_review.md

Limitations:

  • Codex may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /codex-review [context] — Request a code review
  • /codex-query <question> — Ask about the PR or codebase

@github-actions

Copy link
Copy Markdown

✅ Claude Code Review

Auto-triggered after CI passed — reviewing commit ca26282


Summary

Small, well-scoped test-infrastructure change that re-enables track_and_verify_wals in db_crashtest.py with appropriate guardrails. The fault injection exclusions follow established patterns in the file and the approach is sound given the known incompatibility.

High-severity findings (0):
No high-severity findings.

Full review (click to expand)

Findings

🟡 MEDIUM

M1. Missing sync_fault_injection disable — tools/db_crashtest.py:1573
  • Issue: The new sanitization block disables all 8 *_fault_one_in parameters but does NOT disable sync_fault_injection. The sync_fault_injection parameter (line 379) also triggers error recovery paths that flush memtables and rotate WALs — the same mechanism cited in the comment as the reason for disabling the other fault injections. The existing "direct write" sanitization block (line 1047) disables sync_fault_injection alongside the exact same 8 fault parameters, suggesting it should be included here too.
  • Root cause: sync_fault_injection triggers a different (sync-loss) recovery path, but it still involves WAL rotation that could conflict with track_and_verify_wals.
  • Suggested fix: Add dest_params["sync_fault_injection"] = 0 to the new block, or explicitly document why sync fault injection is safe with WAL tracking if it is.
M2. Missing error_recovery_with_no_fault_injection consideration — tools/db_crashtest.py:1573
  • Issue: The parameter error_recovery_with_no_fault_injection (line 369) can trigger error recovery even without fault injection enabled. If the incompatibility is specifically about auto-recovery flushing memtables and rotating WALs, this parameter could also trigger the problematic path.
  • Root cause: The comment says "fault injections trigger auto recovery" but error_recovery_with_no_fault_injection explicitly exercises recovery without injection.
  • Suggested fix: Verify whether error_recovery_with_no_fault_injection=1 combined with track_and_verify_wals=1 is safe. If not, disable it in the same block.

🟢 LOW / NIT

L1. Ordering robustness — tools/db_crashtest.py:1573
  • Issue: The new block is placed at the very end of finalize_and_sanitize, which is the correct position (it won't be overridden by later blocks). The pattern matches other sanitization blocks in the function.
  • Suggested fix: None needed — positive observation about correct placement.

Cross-Component Analysis

Concern Status
Does the block cover all fault injection params? 8 of 8 *_fault_one_in covered; sync_fault_injection possibly missing
Ordering conflicts with other sanitization blocks None — placed last, no later block re-enables faults
Pattern consistency with existing blocks Matches the pattern at lines 1047-1055 (direct write block) except for sync_fault_injection

Positive Observations


ℹ️ About this response

Generated by Claude Code.
Review methodology: claude_md/code_review.md

Limitations:

  • Claude may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /claude-review [context] — Request a code review
  • /claude-query <question> — Ask about the PR or codebase

@joshkang97 joshkang97 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review automatically exported from Phabricator review in Meta.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants