Verify CPU corruption after op via --verify_cpu_corruption_dir (#14852)#14852
Verify CPU corruption after op via --verify_cpu_corruption_dir (#14852)#14852hx235 wants to merge 1 commit into
Conversation
|
@hx235 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D107999834. |
✅ clang-tidy: No findings on changed linesCompleted in 91.2s. |
Codex Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.🟡 Codex Code ReviewAuto-triggered after CI passed — reviewing commit 9aebab1 ❌ Codex review failed before producing findings. ℹ️ About this responseGenerated by Codex CLI. Limitations:
Commands:
|
Claude Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.✅ Claude Code ReviewAuto-triggered after CI passed — reviewing commit 9aebab1 SummaryThis PR adds a useful CPU corruption detection layer to db_stress via High-severity findings (2):
Full review (click to expand)Findings🔴 HIGHH1.
|
| Context | Compatible? | Notes |
|---|---|---|
| NonBatchedOpsStressTest | YES | Primary target, state tracking works |
| BatchedOpsStressTest | NO | IsStateTracked()=false, expected state not populated |
| MultiOpsTxnsStressTest | NO | IsStateTracked()=false, different data model |
| CfConsistencyStressTest | NO | IsStateTracked()=false |
| User-defined timestamps | NO | ReadOptions.timestamp not set (H2) |
| Wide columns / PutEntity | PARTIAL | Default column correct, non-default not validated |
| Merge operations | YES | db_->Get() returns merged result correctly |
| Background compaction | YES | Compaction preserves data visibility; compaction filter is already mutually exclusive with most features |
Assumption stress-test results:
- "Single writer": Valid with --threads=1. Background compaction doesn't change logical data visibility (only reorganizes files). Compaction filters are already excluded by existing flag validation.
- "Fault injection off": Mostly valid but
secondary_cache_fault_one_inis missed (M2). - "Only OK/NotFound statuses": False -- IOError and other statuses possible even without injection (M3).
Positive Observations
- The void-to-Status refactor for
TestCompactRangeandTestCompactFilesis clean and correctly propagates through the only override (MultiOpsTxnsStressTest). - The
DataCorruptionstruct with Slice fields has correct lifetime management -- Slices reference caller-owned buffers (db_valuestring andexpected_scratcharray) that outlive the struct usage. - JSON output uses
std::quoted()forop_statusfield escaping and hex encoding for values -- safe against injection. - The early-return pattern in
MaybeVerifyCpuCorruption()(checkingFLAGS_verify_cpu_corruption_dir.empty()) has negligible overhead on the normal (disabled) path. - Placing verification before re-enabling fault injection (see "put" case comments) is correct and well-documented.
ℹ️ About this response
Generated by Claude Code.
Review methodology: claude_md/code_review.md
Limitations:
- Claude may miss context from files not in the diff
- Large PRs may be truncated
- Always apply human judgment to AI suggestions
Commands:
/claude-review [context]— Request a code review/claude-query <question>— Ask about the PR or codebase
…acebook#14852) Summary: Detection layer of the CPU corruption injector (coming up). With `--verify_cpu_corruption_dir=<dir>`, db_stress reads back the full keyspace after every write/flush/compaction op and compares it to the expected-values model, classifying any mismatch by `kind`: `lost` / `resurrected` / `wrong-value` (silent data corruption) or `detected-corruption` (a status/checksum-caught error). Each finding is written to `<dir>/data_corruption.<tid>.json` ({kind, cf, key, value_from_db, value_from_expected, op_status}) and routed through db_stress's standard `VerificationAbort` for a clean exit-1. A startup guard requires `--threads=1` and all fault injection off so the read-back is single-writer and the only corruption present is the injected one Differential Revision: D107999834
|
Fixed meaningful AI feedback |
…acebook#14852) Summary: Detection layer of the CPU corruption injector (coming up). With `--verify_cpu_corruption_dir=<dir>`, db_stress reads back the full keyspace after every write/flush/compaction op and compares it to the expected-values model, classifying any mismatch by `kind`: `lost` / `resurrected` / `wrong-value` (silent data corruption) or `detected-corruption` (a status/checksum-caught error). Each finding is written to `<dir>/data_corruption.<tid>.json` ({kind, cf, key, value_from_db, value_from_expected, op_status}) and routed through db_stress's standard `VerificationAbort` for a clean exit-1. A startup guard requires `--threads=1` and all fault injection off so the read-back is single-writer and the only corruption present is the injected one **Test plan:** 1.Startup guard rejects misconfiguration: ``` --threads=2 -> exit 1: "--verify_cpu_corruption_dir requires --threads=1" --read_fault_one_in=5 -> exit 1: "requires all fault injection off" ``` 2.No false positive (clean CORE preset run, no injection): ``` $ db_stress --verify_cpu_corruption_dir=<dir> --threads=1 (full protections, all *_fault_one_in=0) ... exit 0; no data_corruption.<tid>.json produced; "Verification successful" ``` 3.Write-path cpu corruption injection (coming up, e.g, gdb flips a register inside MemTable::Add), then the immediate post-op read-back catches it. Real `<dir>/data_corruption.<tid>.json`: silent data corruption -- write returned OK but the key is gone on read-back: ``` {"kind":"lost","cf":0,"key":9814,"value_from_db":"","value_from_expected":"010000000504070609080B0A0D0C0F0E","op_status":"Get: NotFound"} ``` detected corruption -- read-back Get returns Corruption via the memtable per-key checksum: ``` {"kind":"detected-corruption","cf":0,"key":139,"value_from_db":"","value_from_expected":"","op_status":"Get: Corruption: Corrupted memtable entry, per key-value checksum verification failed." ``` Differential Revision: D107999834
|
Fix format |
🟡 Codex Code ReviewAuto-triggered after CI passed — reviewing commit aafd57d ❌ Codex review failed before producing findings. ℹ️ About this responseGenerated by Codex CLI. Limitations:
Commands:
|
✅ Claude Code ReviewAuto-triggered after CI passed — reviewing commit aafd57d SummaryWell-structured, focused PR that adds CPU corruption detection to db_stress with appropriate safety guards. The design is sound: full-keyspace read-back after each op under single-writer constraints. The void-to-Status refactoring of High-severity findings (0): No high-severity findings. Full review (click to expand)Findings🔴 HIGHNone. 🟡 MEDIUMM1. Missing startup guard for
|
| Context | Assumptions hold? | Action needed? |
|---|---|---|
--num_dbs > 1 |
File collision risk (M1) | Guard or fix filename |
--enable_compaction_filter |
Safe - only removes already-deleted keys | None |
--use_txn + commit_bypass_memtable |
Safe - committed before read-back | None |
--ttl > 0 |
Potential false positive (M2) | Consider guarding |
| User-defined timestamps | Handled correctly | None |
| BlobDB | Safe - Get() handles blob indirection | None |
| Background compaction/flush | Safe - doesn't modify expected state | None |
Positive Observations
- Clean refactoring: The
void -> Statuschange forTestCompactRange/TestCompactFilesis minimal and correctly updates the only override. - Proper fault injection window: MaybeVerifyCpuCorruption is called BEFORE re-enabling fault injection, ensuring clean read-back.
- Defensive unexpected-status handling:
ImmediateExit(1)for non-OK/non-NotFound/non-Corruption statuses is appropriate. - Good startup guards: Catches test mode incompatibilities and all fault injection flags.
- Tech debt acknowledgment: The TODO noting duplication with VerifyDb and the refactoring plan is good practice.
- Correct user-timestamp handling: Read-back correctly sets a read timestamp when
FLAGS_user_timestamp_size > 0.
ℹ️ About this response
Generated by Claude Code.
Review methodology: claude_md/code_review.md
Limitations:
- Claude may miss context from files not in the diff
- Large PRs may be truncated
- Always apply human judgment to AI suggestions
Commands:
/claude-review [context]— Request a code review/claude-query <question>— Ask about the PR or codebase
…14852) Summary: Detection layer of the CPU corruption injector (coming up). With `--verify_cpu_corruption_dir=<dir>`, db_stress reads back the full keyspace after every write/flush/compaction op and compares it to the expected-values model, classifying any mismatch by `kind`: `lost` / `resurrected` / `wrong-value` (silent data corruption) or `detected-corruption` (a status/checksum-caught error). Each finding is written to `<dir>/data_corruption.<tid>.json` ({kind, cf, key, value_from_db, value_from_expected, op_status}) and routed through db_stress's standard `VerificationAbort` for a clean exit-1. A startup guard requires `--threads=1` and all fault injection off so the read-back is single-writer and the only corruption present is the injected one **Test plan:** 1.Startup guard rejects misconfiguration: ``` --threads=2 -> exit 1: "--verify_cpu_corruption_dir requires --threads=1" --read_fault_one_in=5 -> exit 1: "requires all fault injection off" ``` 2.No false positive (clean CORE preset run, no injection): ``` $ db_stress --verify_cpu_corruption_dir=<dir> --threads=1 (full protections, all *_fault_one_in=0) ... exit 0; no data_corruption.<tid>.json produced; "Verification successful" ``` 3.Write-path cpu corruption injection (coming up, e.g, gdb flips a register inside MemTable::Add), then the immediate post-op read-back catches it. Real `<dir>/data_corruption.<tid>.json`: silent data corruption -- write returned OK but the key is gone on read-back: ``` {"kind":"lost","cf":0,"key":9814,"value_from_db":"","value_from_expected":"010000000504070609080B0A0D0C0F0E","op_status":"Get: NotFound"} ``` detected corruption -- read-back Get returns Corruption via the memtable per-key checksum: ``` {"kind":"detected-corruption","cf":0,"key":139,"value_from_db":"","value_from_expected":"","op_status":"Get: Corruption: Corrupted memtable entry, per key-value checksum verification failed." ``` Differential Revision: D107999834
) Summary: Injection layer of the CPU corruption injector (tools/cpu_corruption_injector/injector.py), runs inside gdb and corrupt a register by bit flip in exactly one db_stress op (i.e, write, foreground compaction and flush) per stress test run. Detection is at db_stress (#14852); orchestration is coming up. How one run works - The orchestration layer, coming up, randomly picks which op instance (so corruption lands at different points in the LSM's life) and which target_fn per run (so it has a reasonable number of instructions to step under a reasonable time limit); injector.py picks which instruction within target_fn. - Attach: gdb starts with injector.py's parameters passed via -iex and the db_stress command after --args, so db_stress runs unmodified. Example: ``` gdb --batch --nx \ -iex "py import sys; sys.argv=['injector.py','--op','write','--op_index','42','--entry_fn','rocksdb::MemTable::Add','--target_fn','rocksdb::MemTable::Add','--corruptions_per_op','1','--seed','7','--dir','<rundir>']" \ -x tools/cpu_corruption_injector/injector.py \ --args <db_stress> --threads=1 --verify_cpu_corruption_dir=<rundir> ... ``` - Reach the op: entry_fn is called exactly once per stress test run's op so the op_index-th op is its op_index-th call. The orchestration layer picks op_index . `injector_navigate.py` breaks on entry_fn and set a gdb ignore-count of op_index-1 to fast-forward to op_index-th one. - Warm up: `injector_critical_instruction.py` will choose "critical instruction" (those that move key/value bytes with general-purpose or vector registers or set a branch flag) uniformly within the chosen `target_fn` (within `entry_fn`) by the orchestration layer. In order to do that, it needs to approximate how many such instructions within `target_fn`. Hence we have this warm-up phase. It single-steps the first call of target_fn to count and pick the critical instruction index, then corrupt that index at a later call. - Corrupt: on a later call of target_fn, `injector_critical_instruction.py` single-step to the m-th critical instruction and bit-flip the register through `injector_register_corruption.py`. The way to corrupt register depends on what instruction it is. - Record: `injector_telemetry.py` provides telemetry to capture the corruption for later analysis. **Test plan:** 1. Isolated tests (real gdb-captured x/i fixtures): test_inject_critical_instruction 6/6 2. E2E test on navigation, inject, telemetry will be done in the later orchestration PR. Below is inject.json from such run ``` { "injection_result": "injected", "db_stress_crash_signal": null, "op": "write", "op_index": 279, "entry_fn": "rocksdb::MemTable::Add", "target_fn": "rocksdb::MemTable::Add", "critical_instruction_index": 37, "corruptions": [ { "instruction": "mov %rsi,0x8c8(%rbx)", "register": "rsi", "corruption_type": "bit_flip", "before": "0x7fffee4c64d8", "after": "0x7fffee4c64c8", "details": { "source": "rocksdb::Arena::AllocateAligned @ ./fbcode/internal_repo_rocksdb/repo/memory/arena.cc:135", "call_chain": [ "rocksdb::Arena::AllocateAligned @ ./fbcode/internal_repo_rocksdb/repo/memory/arena.cc:135", "rocksdb::ConcurrentArena::AllocateAligned(unsigned long, unsigned long, rocksdb::Logger*)::{lambda()#1}::operator()() const @ fbcode/internal_repo_rocksdb/repo/memory/concurrent_arena.h:65", "rocksdb::ConcurrentArena::AllocateImpl<rocksdb::ConcurrentArena::AllocateAligned(unsigned long, unsigned long, rocksdb::Logger*)::{lambda()#1}>(unsigned long, bool, rocksdb::ConcurrentArena::AllocateAligned(unsigned long, unsigned long, rocksdb::Logger*)::{lambda()#1} const&) @ fbcode/internal_repo_rocksdb/repo/memory/concurrent_arena.h:145", "rocksdb::ConcurrentArena::AllocateAligned @ fbcode/internal_repo_rocksdb/repo/memory/concurrent_arena.h:63", "rocksdb::InlineSkipList<rocksdb::MemTableRep::KeyComparator const&>::AllocateNode @ fbcode/internal_repo_rocksdb/repo/memtable/inlineskiplist.h:868", "rocksdb::InlineSkipList<rocksdb::MemTableRep::KeyComparator const&>::AllocateKey @ fbcode/internal_repo_rocksdb/repo/memtable/inlineskiplist.h:855", "rocksdb::(anonymous namespace)::SkipListRep::Allocate @ ./fbcode/internal_repo_rocksdb/repo/memtable/skiplistrep.cc:36", "rocksdb::MemTable::Add @ ./fbcode/internal_repo_rocksdb/repo/db/memtable.cc:1157" ] } } ], "ops_seen": 279, "critical_instructions_seen": 38 } ``` Differential Revision: D107999835
…ook#14852) Summary: Detection layer of the CPU corruption injector (facebook#14858). With `--verify_cpu_corruption_dir=<dir>`, db_stress reads back the full keyspace after every write/manual flush/manual compaction op and compares it to the expected-values model, classifying any mismatch by `kind`: `lost` / `resurrected` / `wrong-value` (silent data corruption) or `detected-corruption` (a status/checksum-caught error). Each finding is written to `<dir>/data_corruption.<tid>.json` ({kind, cf, key, value_from_db, value_from_expected, op_status}) and routed through db_stress's standard `VerificationAbort` for a clean exit-1. A startup guard requires `--threads=1` and all fault injection off so the read-back is single-writer and the only corruption present is the injected one Bonus: a minor refactoring into the surrounding error handling code in these ops **Test plan:** 1.Startup guard rejects misconfiguration: ``` --threads=2 -> exit 1: "--verify_cpu_corruption_dir requires --threads=1" --read_fault_one_in=5 -> exit 1: "requires all fault injection off" ``` 2.No false positive (clean CORE preset run, no injection): ``` $ db_stress --verify_cpu_corruption_dir=<dir> --threads=1 (full protections, all *_fault_one_in=0) ... exit 0; no data_corruption.<tid>.json produced; "Verification successful" ``` 3.Write-path cpu corruption injection (coming up, e.g, gdb flips a register inside MemTable::Add), then the immediate post-op read-back catches it. Real `<dir>/data_corruption.<tid>.json`: silent data corruption -- write returned OK but the key is gone on read-back: ``` {"kind":"lost","cf":0,"key":9814,"value_from_db":"","value_from_expected":"010000000504070609080B0A0D0C0F0E","op_status":"Get: NotFound"} ``` detected corruption -- read-back Get returns Corruption via the memtable per-key checksum: ``` {"kind":"detected-corruption","cf":0,"key":139,"value_from_db":"","value_from_expected":"","op_status":"Get: Corruption: Corrupted memtable entry, per key-value checksum verification failed." ``` 4.See PR [todo]'s spread in the outcome for verification of detection Differential Revision: D107999834
Summary: Detection layer of the CPU corruption injector (#14858). With `--verify_cpu_corruption_dir=<dir>`, db_stress reads back the full keyspace after every write/manual flush/manual compaction op and compares it to the expected-values model, classifying any mismatch by `kind`: `lost` / `resurrected` / `wrong-value` (silent data corruption) or `detected-corruption` (a status/checksum-caught error). Each finding is written to `<dir>/data_corruption.<tid>.json` ({kind, cf, key, value_from_db, value_from_expected, op_status}) and routed through db_stress's standard `VerificationAbort` for a clean exit-1. A startup guard requires `--threads=1` and all fault injection off so the read-back is single-writer and the only corruption present is the injected one Bonus: a minor refactoring into the surrounding error handling code in these ops **Test plan:** 1.Startup guard rejects misconfiguration: ``` --threads=2 -> exit 1: "--verify_cpu_corruption_dir requires --threads=1" --read_fault_one_in=5 -> exit 1: "requires all fault injection off" ``` 2.No false positive (clean CORE preset run, no injection): ``` $ db_stress --verify_cpu_corruption_dir=<dir> --threads=1 (full protections, all *_fault_one_in=0) ... exit 0; no data_corruption.<tid>.json produced; "Verification successful" ``` 3.Write-path cpu corruption injection (coming up, e.g, gdb flips a register inside MemTable::Add), then the immediate post-op read-back catches it. Real `<dir>/data_corruption.<tid>.json`: silent data corruption -- write returned OK but the key is gone on read-back: ``` {"kind":"lost","cf":0,"key":9814,"value_from_db":"","value_from_expected":"010000000504070609080B0A0D0C0F0E","op_status":"Get: NotFound"} ``` detected corruption -- read-back Get returns Corruption via the memtable per-key checksum: ``` {"kind":"detected-corruption","cf":0,"key":139,"value_from_db":"","value_from_expected":"","op_status":"Get: Corruption: Corrupted memtable entry, per key-value checksum verification failed." ``` 4.See PR [todo]'s spread in the outcome for verification of detection Differential Revision: D107999834
) Summary: This PR is the injection layer of the CPU corruption injector, runs inside gdb and randomly corrupts a register by bit flip in exactly one db_stress op (i.e, write, foreground compaction and flush) per stress test run. Detection layer is at db_stress (#14852); orchestration layer is coming up. __How one run works__ - The orchestration layer, coming up, randomly picks which stress test `op` instance (so corruption can land at different points in the LSM shape journey) and which `target_fn` of that `op` (so to cap instructions to step under a reasonable limit; `injector.py` in this PR randomly picks which instruction within the `target_fn` to inject (so corruption can land at different points of a `target_fn`). - Attach: gdb starts with injector.py's parameters passed via -iex and the db_stress command after --args, so db_stress runs unmodified. Example: ``` gdb --batch --nx \ -iex "py import sys; sys.argv=['injector.py','--op','write','--op_index','42','--entry_fn','rocksdb::MemTable::Add','--target_fn','rocksdb::MemTable::Add','--corruptions_per_op','1','--seed','7','--dir','<rundir>']" \ -x tools/cpu_corruption_injector/injector.py \ --args <db_stress> --threads=1 --verify_cpu_corruption_dir=<rundir> ... ``` - Navigate: The orchestration layer will pick op_index. `entry_fn` is called exactly once per stress test run's op so the op_index-th op is its op_index-th call. `injector_navigate.py` breaks on `entry_fn` and set a gdb ignore-count of op_index-1 to fast-forward to op_index-th one. It also breaks at the first `target_fn` within that `entry_fn`. - Warm up: `injector_critical_instruction.py` will choose "critical instruction" (those that move key/value bytes with general-purpose or vector registers or set a branch flag) uniformly within the chosen `target_fn` by the orchestration layer. In order to do that, it needs to approximate how many such instructions within the `target_fn`. Hence we have this warm-up phase. It single-steps the instruction within the first encoutering of `target_fn` to count and draw the critical instruction index, then corrupt that index at a later call. - Corrupt: on a later call of `target_fn`, `injector_critical_instruction.py` single-step to the m-th critical instruction and bit-flip the register through `injector_register_corruption.py`. The way to corrupt register depends on what instruction it is. If the current call of `target_fn`'s m-th instruction is not a critical instruction, we will try next `target_fn` till running out of `target_fn`. - Record: `injector_telemetry.py` provides telemetry to capture the corruption for later analysis. **Test plan:** 1. Isolated tests (real gdb-captured x/i fixtures): test_inject_critical_instruction 2. E2E test on navigation, inject, telemetry will be done in the later orchestration PR. Below is inject.json from such run ``` { "injection_result": "injected", "db_stress_crash_signal": null, "op": "write", "op_index": 279, "entry_fn": "rocksdb::MemTable::Add", "target_fn": "rocksdb::MemTable::Add", "critical_instruction_index": 37, "corruptions": [ { "instruction": "mov %rsi,0x8c8(%rbx)", "register": "rsi", "corruption_type": "bit_flip", "before": "0x7fffee4c64d8", "after": "0x7fffee4c64c8", "details": { "source": "rocksdb::Arena::AllocateAligned @ ./fbcode/internal_repo_rocksdb/repo/memory/arena.cc:135", "call_chain": [ "rocksdb::Arena::AllocateAligned @ ./fbcode/internal_repo_rocksdb/repo/memory/arena.cc:135", "rocksdb::ConcurrentArena::AllocateAligned(unsigned long, unsigned long, rocksdb::Logger*)::{lambda()#1}::operator()() const @ fbcode/internal_repo_rocksdb/repo/memory/concurrent_arena.h:65", "rocksdb::ConcurrentArena::AllocateImpl<rocksdb::ConcurrentArena::AllocateAligned(unsigned long, unsigned long, rocksdb::Logger*)::{lambda()#1}>(unsigned long, bool, rocksdb::ConcurrentArena::AllocateAligned(unsigned long, unsigned long, rocksdb::Logger*)::{lambda()#1} const&) @ fbcode/internal_repo_rocksdb/repo/memory/concurrent_arena.h:145", "rocksdb::ConcurrentArena::AllocateAligned @ fbcode/internal_repo_rocksdb/repo/memory/concurrent_arena.h:63", "rocksdb::InlineSkipList<rocksdb::MemTableRep::KeyComparator const&>::AllocateNode @ fbcode/internal_repo_rocksdb/repo/memtable/inlineskiplist.h:868", "rocksdb::InlineSkipList<rocksdb::MemTableRep::KeyComparator const&>::AllocateKey @ fbcode/internal_repo_rocksdb/repo/memtable/inlineskiplist.h:855", "rocksdb::(anonymous namespace)::SkipListRep::Allocate @ ./fbcode/internal_repo_rocksdb/repo/memtable/skiplistrep.cc:36", "rocksdb::MemTable::Add @ ./fbcode/internal_repo_rocksdb/repo/db/memtable.cc:1157" ] } } ], "ops_seen": 279, "critical_instructions_seen": 38 } ``` Differential Revision: D107999835
Summary: Detection layer of the CPU corruption injector (#14858). With `--verify_cpu_corruption_dir=<dir>`, db_stress reads back the full keyspace after every write/manual flush/manual compaction op and compares it to the expected-values model, classifying any mismatch by `kind`: `lost` / `resurrected` / `wrong-value` (silent data corruption) or `detected-corruption` (a status/checksum-caught error). Each finding is written to `<dir>/data_corruption.<tid>.json` ({kind, cf, key, value_from_db, value_from_expected, op_status}) and routed through db_stress's standard `VerificationAbort` for a clean exit-1. A startup guard requires `--threads=1` and all fault injection off so the read-back is single-writer and the only corruption present is the injected one Bonus: a minor refactoring into the surrounding error handling code in these ops **Test plan:** 1.Startup guard rejects misconfiguration: ``` --threads=2 -> exit 1: "--verify_cpu_corruption_dir requires --threads=1" --read_fault_one_in=5 -> exit 1: "requires all fault injection off" ``` 2.No false positive (clean CORE preset run, no injection): ``` $ db_stress --verify_cpu_corruption_dir=<dir> --threads=1 (full protections, all *_fault_one_in=0) ... exit 0; no data_corruption.<tid>.json produced; "Verification successful" ``` 3.Write-path cpu corruption injection (coming up, e.g, gdb flips a register inside MemTable::Add), then the immediate post-op read-back catches it. Real `<dir>/data_corruption.<tid>.json`: silent data corruption -- write returned OK but the key is gone on read-back: ``` {"kind":"lost","cf":0,"key":9814,"value_from_db":"","value_from_expected":"010000000504070609080B0A0D0C0F0E","op_status":"Get: NotFound"} ``` detected corruption -- read-back Get returns Corruption via the memtable per-key checksum: ``` {"kind":"detected-corruption","cf":0,"key":139,"value_from_db":"","value_from_expected":"","op_status":"Get: Corruption: Corrupted memtable entry, per key-value checksum verification failed." ``` 4.See PR [todo]'s spread in the outcome for verification of detection Differential Revision: D107999834
) Summary: This PR is the injection layer of the CPU corruption injector, runs inside gdb and randomly corrupts a register by bit flip in exactly one db_stress op (i.e, write, foreground compaction and flush) per stress test run. Detection layer is at db_stress (#14852); orchestration layer is coming up. __How one run works__ - The orchestration layer, coming up, randomly picks which stress test `op` instance (so corruption can land at different points in the LSM shape journey) and which `target_fn` of that `op` (so to cap instructions to step under a reasonable limit; `injector.py` in this PR randomly picks which instruction within the `target_fn` to inject (so corruption can land at different points of a `target_fn`). - Attach: gdb starts with injector.py's parameters passed via -iex and the db_stress command after --args, so db_stress runs unmodified. Example: ``` gdb --batch --nx \ -iex "py import sys; sys.argv=['injector.py','--op','write','--op_index','42','--entry_fn','rocksdb::MemTable::Add','--target_fn','rocksdb::MemTable::Add','--corruptions_per_op','1','--seed','7','--dir','<rundir>']" \ -x tools/cpu_corruption_injector/injector.py \ --args <db_stress> --threads=1 --verify_cpu_corruption_dir=<rundir> ... ``` - Navigate: The orchestration layer will pick op_index. `entry_fn` is called exactly once per stress test run's op so the op_index-th op is its op_index-th call. `injector_navigate.py` breaks on `entry_fn` and set a gdb ignore-count of op_index-1 to fast-forward to op_index-th one. It also breaks at the first `target_fn` within that `entry_fn`. - Warm up: `injector_critical_instruction.py` will choose "critical instruction" (those that move key/value bytes with general-purpose or vector registers or set a branch flag) uniformly within the chosen `target_fn` by the orchestration layer. In order to do that, it needs to approximate how many such instructions within the `target_fn`. Hence we have this warm-up phase. It single-steps the instruction within the first encoutering of `target_fn` to count and draw the critical instruction index, then corrupt that index at a later call. - Corrupt: on a later call of `target_fn`, `injector_critical_instruction.py` single-step to the m-th critical instruction and bit-flip the register through `injector_register_corruption.py`. The way to corrupt register depends on what instruction it is. If the current call of `target_fn`'s m-th instruction is not a critical instruction, we will try next `target_fn` till running out of `target_fn`. - Record: `injector_telemetry.py` provides telemetry to capture the corruption for later analysis. **Test plan:** 1. Isolated tests (real gdb-captured x/i fixtures): test_inject_critical_instruction 2. E2E test on navigation, inject, telemetry will be done in the later orchestration PR. Below is inject.json from such run ``` { "injection_result": "injected", "db_stress_crash_signal": null, "op": "write", "op_index": 279, "entry_fn": "rocksdb::MemTable::Add", "target_fn": "rocksdb::MemTable::Add", "critical_instruction_index": 37, "corruptions": [ { "instruction": "mov %rsi,0x8c8(%rbx)", "register": "rsi", "corruption_type": "bit_flip", "before": "0x7fffee4c64d8", "after": "0x7fffee4c64c8", "details": { "source": "rocksdb::Arena::AllocateAligned @ ./fbcode/internal_repo_rocksdb/repo/memory/arena.cc:135", "call_chain": [ "rocksdb::Arena::AllocateAligned @ ./fbcode/internal_repo_rocksdb/repo/memory/arena.cc:135", "rocksdb::ConcurrentArena::AllocateAligned(unsigned long, unsigned long, rocksdb::Logger*)::{lambda()#1}::operator()() const @ fbcode/internal_repo_rocksdb/repo/memory/concurrent_arena.h:65", "rocksdb::ConcurrentArena::AllocateImpl<rocksdb::ConcurrentArena::AllocateAligned(unsigned long, unsigned long, rocksdb::Logger*)::{lambda()#1}>(unsigned long, bool, rocksdb::ConcurrentArena::AllocateAligned(unsigned long, unsigned long, rocksdb::Logger*)::{lambda()#1} const&) @ fbcode/internal_repo_rocksdb/repo/memory/concurrent_arena.h:145", "rocksdb::ConcurrentArena::AllocateAligned @ fbcode/internal_repo_rocksdb/repo/memory/concurrent_arena.h:63", "rocksdb::InlineSkipList<rocksdb::MemTableRep::KeyComparator const&>::AllocateNode @ fbcode/internal_repo_rocksdb/repo/memtable/inlineskiplist.h:868", "rocksdb::InlineSkipList<rocksdb::MemTableRep::KeyComparator const&>::AllocateKey @ fbcode/internal_repo_rocksdb/repo/memtable/inlineskiplist.h:855", "rocksdb::(anonymous namespace)::SkipListRep::Allocate @ ./fbcode/internal_repo_rocksdb/repo/memtable/skiplistrep.cc:36", "rocksdb::MemTable::Add @ ./fbcode/internal_repo_rocksdb/repo/db/memtable.cc:1157" ] } } ], "ops_seen": 279, "critical_instructions_seen": 38 } ``` Differential Revision: D107999835
…ook#14852) Summary: Detection layer of the CPU corruption injector (facebook#14858). With `--verify_cpu_corruption_dir=<dir>`, db_stress reads back the full keyspace after every write/manual flush/manual compaction op and compares it to the expected-values model, classifying any mismatch by `kind`: `lost` / `resurrected` / `wrong-value` (silent data corruption) or `detected-corruption` (a status/checksum-caught error). Each finding is written to `<dir>/data_corruption.<tid>.json` ({kind, cf, key, value_from_db, value_from_expected, op_status}) and routed through db_stress's standard `VerificationAbort` for a clean exit-1. A startup guard requires `--threads=1` and all fault injection off so the read-back is single-writer and the only corruption present is the injected one Bonus: a minor refactoring into the surrounding error handling code in these ops **Test plan:** 1.Startup guard rejects misconfiguration: ``` --threads=2 -> exit 1: "--verify_cpu_corruption_dir requires --threads=1" --read_fault_one_in=5 -> exit 1: "requires all fault injection off" ``` 2.No false positive (clean CORE preset run, no injection): ``` $ db_stress --verify_cpu_corruption_dir=<dir> --threads=1 (full protections, all *_fault_one_in=0) ... exit 0; no data_corruption.<tid>.json produced; "Verification successful" ``` 3.Write-path cpu corruption injection (coming up, e.g, gdb flips a register inside MemTable::Add), then the immediate post-op read-back catches it. Real `<dir>/data_corruption.<tid>.json`: silent data corruption -- write returned OK but the key is gone on read-back: ``` {"kind":"lost","cf":0,"key":9814,"value_from_db":"","value_from_expected":"010000000504070609080B0A0D0C0F0E","op_status":"Get: NotFound"} ``` detected corruption -- read-back Get returns Corruption via the memtable per-key checksum: ``` {"kind":"detected-corruption","cf":0,"key":139,"value_from_db":"","value_from_expected":"","op_status":"Get: Corruption: Corrupted memtable entry, per key-value checksum verification failed." ``` 4.See PR facebook#14866 test plan's spread in the outcome for verification of detection Differential Revision: D107999834
Summary:
Detection layer of the CPU corruption injector (#14858). With
--verify_cpu_corruption_dir=<dir>, db_stress reads back the full keyspace after every write/manual flush/manual compaction op and compares it to the expected-values model, classifying any mismatch bykind:lost/resurrected/wrong-value(silent data corruption) ordetected-corruption(a status/checksum-caught error). Each finding is written to<dir>/data_corruption.<tid>.json({kind, cf, key, value_from_db, value_from_expected, op_status}) and routed through db_stress's standardVerificationAbortfor a clean exit-1. A startup guard requires--threads=1and all fault injection off so the read-back is single-writer and the only corruption present is the injected oneBonus: a minor refactoring into the surrounding error handling code in these ops
Test plan:
1.Startup guard rejects misconfiguration:
2.No false positive (clean CORE preset run, no injection):
3.Write-path cpu corruption injection (coming up, e.g, gdb flips a register inside MemTable::Add), then the immediate post-op read-back catches it. Real
<dir>/data_corruption.<tid>.json:silent data corruption -- write returned OK but the key is gone on read-back:
detected corruption -- read-back Get returns Corruption via the memtable per-key checksum:
4.See PR #14866 test plan's spread in the outcome for verification of detection
Differential Revision: D107999834