CPU corruption injector: runner + db_stress preset flags by hx235 · Pull Request #14866 · facebook/rocksdb

hx235 · 2026-06-18T22:17:43Z

Summary:
Orchestration layer of the CPU corruption injector -- the glue between detection (#14852) and injection (#14858
).

One CPU-corruption injection says little on its own. What matters is the OUTCOME DISTRIBUTION over many injections -- how often a corruption is silently absorbed (NO_EFFECT), crashes the process (CRASH), is caught by an integrity check (CORRUPTION), or more importantly slips through as a silent data corruption (SDC) and the paths frequently leading to those outcomes. A trustworthy distribution needs a (somewhat) repeatable and reproducible harness as well as a db_stress configuration in which an injected corruption is both reachable and attributable to the chosen stress test op (i.e, write, foreground compaction or flush).

Therefore this PR implements a runner that can launch N independent runs for a chosen op type (i.e, write, foreground compaction or flush). Each run picks where to inject, runs db_stress under gdb via the injector,py (#14858), and is classified into one outcome bucket (#14852).

The runner has DB_STRESS_PRESET -- the pinned db_stress config that isolates a single injected corruption (single-threaded, integrity checks on, other fault injection off, auto-compaction off). The runner also does gdb preflight that fails fast when the build or gdb cannot support injection because, for example, the hard-coded target_fn has changed its name, provides parallel launching of many runs and one summary.json per campaign (the outcome distribution plus each run's record). The whole run set is reproducible from one logged base_seed (run i uses base_seed + i).

Test plan:

Build: make DEBUG_LEVEL=0 EXTRA_CXXFLAGS="-g -fno-inline" db_stress

1. Preflight (`verify_injection_site`) catches a build that can't support injection

Before doing any work, the runner has gdb confirm — on this exact binary — that every injection-site function resolves and gdb can read its source line. A good build logs:

INFO gdb check OK for op=compaction: rocksdb::CompactionJob::Run, rocksdb::CompactionIterator::NextFromInput, rocksdb::BlockBuilder::Add

A build that cannot support injection (functions renamed, fully inlined, or absent) fails fast with exit 2 before any run — forced here by pointing --stress_cmd at a non-db_stress binary:

$ python3 tools/cpu_corruption_injector/runner.py --op compaction --runs 1 --stress_cmd /bin/ls --report_dir /tmp/icc/preflight_demo
ERROR gdb could not set a breakpoint on these functions in ls (renamed, fully inlined, or not in this build?): rocksdb::CompactionJob::Run, rocksdb::CompactionIterator::NextFromInput, rocksdb::BlockBuilder::Add
Function "rocksdb::CompactionJob::Run" not defined.
Function "rocksdb::CompactionIterator::NextFromInput" not defined.
# exit code 2

So a broken/inlined build is rejected up front instead of silently producing NO_INJECTION runs.

2. Compaction op -- 100 runs

Runs' outcomes (summary.json):

SDC	CORRUPTION	CRASH	NO_EFFECT	NO_INJECTION	ERROR
9	5	6	79	1	0

Spread:

target_fn x outcome: NextFromInput {SDC 9, CORRUPTION 3, CRASH 2, NO_EFFECT 34, NO_INJECTION 1}; BlockBuilder::Add {CORRUPTION 2, CRASH 4, NO_EFFECT 45}.
corruption_type x outcome: bit_flip {SDC 8, CORRUPTION 3, CRASH 6, NO_EFFECT 32}; flag_flip {SDC 1, CORRUPTION 2, NO_EFFECT 42}; lane_bit_flip {NO_EFFECT 5}.

Analysis: all 9 SDCs land on the read/iterate path (NextFromInput); corrupting the output writer (BlockBuilder::Add) never produced an SDC (its blocks are checksummed -- corruption there is caught or inert). The 5 detected CORRUPTIONs are compaction's key-order and record-count cross-checks firing (both CompactRange and CompactFiles origins appear), correctly bucketed by the fix.

A representative compaction SDC: `run_00000`

What we corrupted (inject.json):

{"op":"compaction","op_index":17,"entry_fn":"rocksdb::CompactionJob::Run","target_fn":"rocksdb::CompactionIterator::NextFromInput","injection_result":"injected","db_stress_crash_signal":null,
 "corruptions":[{"instruction":"mov    %rdx,0x160(%r12)","register":"rdx","corruption_type":"bit_flip","before":"0x10","after":"0x18",
   "details":{"source":"rocksdb::CompactionIterator::NextFromInput @ db/compaction/compaction_iterator.cc:719"}}]}

The recorded silent corruption (data_corruption.<tid>.json):

{"kind":"wrong-value","cf":0,"key":70,"value_from_db":"010000000504070609080B0A0D0C0F0E070F105E78787878","value_from_expected":"010000000504070609080B0A0D0C0F0E","op_status":"Get: OK"}

Walkthrough: a single bit flip on rdx (0x10 -> 0x18, 16 -> 24) at CompactionIterator::NextFromInput (compaction_iterator.cc:719) -- rdx holds the value LENGTH stored into the iterator's record (offset 0x160). The length reads 24 instead of 16, so compaction copies a value 8 bytes too long into the output SST, absorbing adjacent bytes. The internal key is untouched (ParseInternalKey passes); the over-long value is the single Slice fed to both the paranoid validator and the SST builder, so the file is self-consistent and every checksum agrees. On read-back Get(key=70) returns OK with the wrong bytes -- value_from_db is the expected value (...0F0E) plus 8 trailing bytes (070F105E78787878). Silent: read OK, all checks pass, the value visibly grew. classify() routes kind=wrong-value to the SDC bucket.

A representative compaction CORRUPTION (detected): `run_00007`

What we corrupted (inject.json):

{"op":"compaction","op_index":41,"entry_fn":"rocksdb::CompactionJob::Run","target_fn":"rocksdb::CompactionIterator::NextFromInput","injection_result":"injected","db_stress_crash_signal":"SIGABRT",
 "corruptions":[{"instruction":"mov    (%rbx),%rdi","register":"rdi","corruption_type":"bit_flip","before":"0x7fffeee8c1c0","after":"0x7fffeee8c1c2",
   "details":{"source":"rocksdb::IterKey::SetKeyImpl @ ./db/dbformat.h:941","call_chain":["rocksdb::IterKey::SetKeyImpl @ ./db/dbformat.h:941","rocksdb::CompactionIterator::NextFromInput @ db/compaction/compaction_iterator.cc:781"]}}]}

The recorded detection (data_corruption.<tid>.json):

{"kind":"detected-corruption","cf":-1,"key":-1,"value_from_db":"","value_from_expected":"","op_status":"compactfiles: Corruption: Compaction sees out-of-order keys."}

Walkthrough: a bit flip on rdi (...c1c0 -> ...c1c2, a key pointer) at IterKey::SetKeyImpl (dbformat.h:941), reached from NextFromInput, mis-sets the iterator's key so the next emitted key is out of order. Compaction's key-order check catches it and returns compactfiles: Corruption: Compaction sees out-of-order keys. The op then takes SIGABRT, but classify() reads the recorded data_corruption result before the crash signal, so the run is correctly bucketed CORRUPTION (the bucketization fix; pre-fix this surfaced as CRASH). classify() routes kind=detected-corruption to the CORRUPTION bucket.

3. Flush op -- 100 runs

Runs' outcomes (summary.json):

SDC	CORRUPTION	CRASH	NO_EFFECT	NO_INJECTION	ERROR
2	12	11	74	1	0

Spread:

target_fn x outcome: BlockBuilder::Add {SDC 1, CORRUPTION 5, CRASH 5, NO_EFFECT 49}; NextFromInput {SDC 1, CORRUPTION 7, CRASH 6, NO_EFFECT 25, NO_INJECTION 1}.
corruption_type x outcome: bit_flip {SDC 2, CORRUPTION 9, CRASH 9, NO_EFFECT 32}; flag_flip {CORRUPTION 3, CRASH 2, NO_EFFECT 42}.

Analysis: flush mirrors compaction's mechanisms (shared iterator/builder). The 2 SDCs are a value/key-pointer corruption that slips past the checksums; the 12 corruptions are caught by the flush-time key-order / key-size integrity checks.

A representative flush SDC: `run_00027`

What we corrupted (inject.json):

{"op":"flush","op_index":16,"entry_fn":"rocksdb::FlushJob::Run","target_fn":"rocksdb::BlockBuilder::Add","injection_result":"injected","db_stress_crash_signal":null,
 "corruptions":[{"instruction":"mov    (%rdi),%rax","register":"rax","corruption_type":"bit_flip","before":"0x7fffef059400","after":"0x7fffef059440",
   "details":{"source":"rocksdb::Slice::data @ ./include/rocksdb/slice.h:58","call_chain":["rocksdb::Slice::data @ ./include/rocksdb/slice.h:58","rocksdb::BlockBuilder::AddWithLastKeyImpl @ table/block_based/block_builder.cc:351"]}}]}

The recorded silent corruption (data_corruption.<tid>.json):

{"kind":"lost","cf":0,"key":763,"value_from_db":"","value_from_expected":"010000000504070609080B0A0D0C0F0E","op_status":"Get: NotFound"}

Walkthrough: a bit flip on rax (a key/value data pointer, ...9400 -> ...9440) at Slice::data (slice.h:58), reached from BlockBuilder::AddWithLastKeyImpl while the flush builds the output block, makes the builder read key bytes from the wrong address, so key 763's entry is written wrong and the key is dropped from the flushed SST. On read-back Get(key=763) returns NotFound for a committed key -- silent. classify() routes kind=lost to the SDC bucket.

A representative flush CORRUPTION (detected): `run_00047`

What we corrupted (inject.json):

{"op":"flush","op_index":7,"entry_fn":"rocksdb::FlushJob::Run","target_fn":"rocksdb::CompactionIterator::NextFromInput","injection_result":"injected","db_stress_crash_signal":"SIGABRT",
 "corruptions":[{"instruction":"cmp    $0x7,%rax","register":"eflags","corruption_type":"flag_flip","before":"0x216","after":"0x256",
   "details":{"source":"rocksdb::ParseInternalKey @ ./db/dbformat.h:523","call_chain":["rocksdb::ParseInternalKey @ ./db/dbformat.h:523","rocksdb::CompactionIterator::NextFromInput @ db/compaction/compaction_iterator.cc:731"]}}]}

The recorded detection (data_corruption.<tid>.json):

{"kind":"detected-corruption","cf":-1,"key":-1,"value_from_db":"","value_from_expected":"","op_status":"flush: Corruption: Corrupted Key: Internal Key too small. Size=16. "}

Walkthrough: a flag flip (eflags 0x216 -> 0x256) on a cmp $0x7,%rax branch in ParseInternalKey (dbformat.h:523), reached from NextFromInput, makes the parser mis-judge the internal-key size, so the flush emits a malformed key and the key-size integrity check returns flush: Corruption: Corrupted Key: Internal Key too small. The op takes SIGABRT; classify() reads the recorded data_corruption before the signal and buckets CORRUPTION (bucketization fix). classify() routes kind=detected-corruption to the CORRUPTION bucket.

4. Write op (`MemTable::Add`) -- two key spaces

A write injection corrupts a single MemTable::Add (a Put/Delete/DeleteRange). The corruption is reachable and attributable, but whether it surfaces as a silent write SDC depends heavily on the key space. A silent write SDC needs the affected/mispositioned key to have other live versions to fall through to -- which only happens in a dense, multi-version memtable. We therefore run two write campaigns: the default max_key=1000, then a small max_key=8. The contrast is what motivates randomizing max_key (see PR #14867 for --randomize_stress_flags).

4a. Default `max_key=1000` -- 100 runs (no silent write SDC)

Runs' outcomes (summary.json):

SDC	CORRUPTION	CRASH	NO_EFFECT	NO_INJECTION	ERROR
0	31	13	56	0	0

With a 1000-key space almost every write touches a distinct key, so a corrupted entry has no older live version to mask it: a value/key byte flip is caught at write by the per-key checksum (VerifyEncodedEntry -> CORRUPTION), and a structural flip tends to crash (CRASH) rather than silently mis-read. ERROR=0, NO_INJECTION=0. No write op silently corrupted data -- every reachable corruption was caught or crashed.

4b. Small `max_key=8` -- 100 runs (surfaces 2 silent write SDCs)

Runs' outcomes (summary.json):

SDC	CORRUPTION	CRASH	NO_EFFECT	NO_INJECTION	ERROR
2	33	8	57	0	0

Shrinking the key space makes each key hold ~125 versions (ops_per_thread / max_key), so a misplaced entry can fall through to an older version of the same key and be returned silently -- the per-key checksum (bytes intact) and on-seek verify cannot see a pure link-position error.

A representative write SDC: `run_00028` (skiplist misposition -> silent stale read, flush catches)

What we corrupted (inject.json):

{"op":"write","op_index":317,"entry_fn":"rocksdb::MemTable::Add","target_fn":"rocksdb::MemTable::Add","injection_result":"injected","db_stress_crash_signal":null,
 "corruptions":[{"instruction":"cmp    %rbx,-0xb8(%rbp)","register":"eflags","corruption_type":"flag_flip","before":"0x216","after":"0x217",
   "details":{"source":"rocksdb::MemTable::Add @ db/memtable.cc:1319",
              "call_chain":["rocksdb::MemTable::Add @ db/memtable.cc:1319"]}}]}

The recorded silent corruption (data_corruption.<tid>.json):

{"kind":"resurrected","cf":0,"key":1,"value_from_db":"110000001514171619181B1A1D1C1F1E0100030205040706","value_from_expected":"","op_status":"Get: OK"}

Walkthrough: a flag flip (CF, eflags 0x216 -> 0x217) on the cmp that produces KeyIsAfterNode inside InlineSkipList::Insert (inlineskiplist.h:1253; the @ memtable.cc:1319 in the record is inlining line-drift) inverts the key comparison, so the Delete tombstone for key 1 is linked at the wrong position. The stored bytes and per-key checksum are intact, so neither the checksum nor on-seek verify sees anything wrong -- on read-back Get(key=1) returns OK with key 1's live value for a key that was Deleted (kind=resurrected, silent). A follow-up Flush() in unit test repro does catch it: the full-scan order check returns Corruption: Out-of-order keys found in skiplist -- caught only after the silent read, not during it.

A representative write CORRUPTION (detected) `max_key=1000 or 8` : `run_00018`

Where run_00028's pure link-position error is invisible to the per-key checksum, this run shows a byte-level corruption that the checksum catches at write time. What we corrupted (inject.json):

{"op":"write","op_index":106,"entry_fn":"rocksdb::MemTable::Add","target_fn":"rocksdb::MemTable::Add","injection_result":"injected","db_stress_crash_signal":null,
 "corruptions":[{"instruction":"mov    %rsi,(%rdi)","register":"rsi","corruption_type":"bit_flip","before":"0x7fffeec2a21c","after":"0x7fffeec2a25c",
   "details":{"source":"rocksdb::Slice::Slice @ ./include/rocksdb/slice.h:39",
              "call_chain":["rocksdb::Slice::Slice @ ./include/rocksdb/slice.h:39","rocksdb::GetVarint32 @ ./util/coding.h:280","rocksdb::MemTable::VerifyEncodedEntry @ db/memtable.cc:1102","rocksdb::MemTable::Add @ db/memtable.cc:1189"]}}]}

The recorded detection (data_corruption.<tid>.json):

{"kind":"detected-corruption","cf":-1,"key":-1,"value_from_db":"","value_from_expected":"","op_status":"put: Corruption: ProtectionInfo mismatch"}

Walkthrough: a bit flip on rsi (0x7fffeec2a21c -> 0x7fffeec2a25c) at Slice::Slice (slice.h:39) while MemTable::Add re-parses the just-encoded entry through VerifyEncodedEntry (memtable.cc:1102) corrupts the Slice the verifier reads, so the recomputed per-key protection info no longer matches and the put returns Corruption: ProtectionInfo mismatch.

Differential Revision: D108367345

Summary: Detection layer of the CPU corruption injector (#14858). With `--verify_cpu_corruption_dir=<dir>`, db_stress reads back the full keyspace after every write/manual flush/manual compaction op and compares it to the expected-values model, classifying any mismatch by `kind`: `lost` / `resurrected` / `wrong-value` (silent data corruption) or `detected-corruption` (a status/checksum-caught error). Each finding is written to `<dir>/data_corruption.<tid>.json` ({kind, cf, key, value_from_db, value_from_expected, op_status}) and routed through db_stress's standard `VerificationAbort` for a clean exit-1. A startup guard requires `--threads=1` and all fault injection off so the read-back is single-writer and the only corruption present is the injected one Bonus: a minor refactoring into the surrounding error handling code in these ops **Test plan:** 1.Startup guard rejects misconfiguration: ``` --threads=2 -> exit 1: "--verify_cpu_corruption_dir requires --threads=1" --read_fault_one_in=5 -> exit 1: "requires all fault injection off" ``` 2.No false positive (clean CORE preset run, no injection): ``` $ db_stress --verify_cpu_corruption_dir=<dir> --threads=1 (full protections, all *_fault_one_in=0) ... exit 0; no data_corruption.<tid>.json produced; "Verification successful" ``` 3.Write-path cpu corruption injection (coming up, e.g, gdb flips a register inside MemTable::Add), then the immediate post-op read-back catches it. Real `<dir>/data_corruption.<tid>.json`: silent data corruption -- write returned OK but the key is gone on read-back: ``` {"kind":"lost","cf":0,"key":9814,"value_from_db":"","value_from_expected":"010000000504070609080B0A0D0C0F0E","op_status":"Get: NotFound"} ``` detected corruption -- read-back Get returns Corruption via the memtable per-key checksum: ``` {"kind":"detected-corruption","cf":0,"key":139,"value_from_db":"","value_from_expected":"","op_status":"Get: Corruption: Corrupted memtable entry, per key-value checksum verification failed." ``` 4.See PR [todo]'s spread in the outcome for verification of detection Differential Revision: D107999834

) Summary: This PR is the injection layer of the CPU corruption injector, runs inside gdb and randomly corrupts a register by bit flip in exactly one db_stress op (i.e, write, foreground compaction and flush) per stress test run. Detection layer is at db_stress (#14852); orchestration layer is coming up. __How one run works__ - The orchestration layer, coming up, randomly picks which stress test `op` instance (so corruption can land at different points in the LSM shape journey) and which `target_fn` of that `op` (so to cap instructions to step under a reasonable limit; `injector.py` in this PR randomly picks which instruction within the `target_fn` to inject (so corruption can land at different points of a `target_fn`). - Attach: gdb starts with injector.py's parameters passed via -iex and the db_stress command after --args, so db_stress runs unmodified. Example: ``` gdb --batch --nx \ -iex "py import sys; sys.argv=['injector.py','--op','write','--op_index','42','--entry_fn','rocksdb::MemTable::Add','--target_fn','rocksdb::MemTable::Add','--corruptions_per_op','1','--seed','7','--dir','<rundir>']" \ -x tools/cpu_corruption_injector/injector.py \ --args <db_stress> --threads=1 --verify_cpu_corruption_dir=<rundir> ... ``` - Navigate: The orchestration layer will pick op_index. `entry_fn` is called exactly once per stress test run's op so the op_index-th op is its op_index-th call. `injector_navigate.py` breaks on `entry_fn` and set a gdb ignore-count of op_index-1 to fast-forward to op_index-th one. It also breaks at the first `target_fn` within that `entry_fn`. - Warm up: `injector_critical_instruction.py` will choose "critical instruction" (those that move key/value bytes with general-purpose or vector registers or set a branch flag) uniformly within the chosen `target_fn` by the orchestration layer. In order to do that, it needs to approximate how many such instructions within the `target_fn`. Hence we have this warm-up phase. It single-steps the instruction within the first encoutering of `target_fn` to count and draw the critical instruction index, then corrupt that index at a later call. - Corrupt: on a later call of `target_fn`, `injector_critical_instruction.py` single-step to the m-th critical instruction and bit-flip the register through `injector_register_corruption.py`. The way to corrupt register depends on what instruction it is. If the current call of `target_fn`'s m-th instruction is not a critical instruction, we will try next `target_fn` till running out of `target_fn`. - Record: `injector_telemetry.py` provides telemetry to capture the corruption for later analysis. **Test plan:** 1. Isolated tests (real gdb-captured x/i fixtures): test_inject_critical_instruction 2. E2E test on navigation, inject, telemetry will be done in the later orchestration PR. Below is inject.json from such run ``` { "injection_result": "injected", "db_stress_crash_signal": null, "op": "write", "op_index": 279, "entry_fn": "rocksdb::MemTable::Add", "target_fn": "rocksdb::MemTable::Add", "critical_instruction_index": 37, "corruptions": [ { "instruction": "mov %rsi,0x8c8(%rbx)", "register": "rsi", "corruption_type": "bit_flip", "before": "0x7fffee4c64d8", "after": "0x7fffee4c64c8", "details": { "source": "rocksdb::Arena::AllocateAligned @ ./fbcode/internal_repo_rocksdb/repo/memory/arena.cc:135", "call_chain": [ "rocksdb::Arena::AllocateAligned @ ./fbcode/internal_repo_rocksdb/repo/memory/arena.cc:135", "rocksdb::ConcurrentArena::AllocateAligned(unsigned long, unsigned long, rocksdb::Logger*)::{lambda()#1}::operator()() const @ fbcode/internal_repo_rocksdb/repo/memory/concurrent_arena.h:65", "rocksdb::ConcurrentArena::AllocateImpl<rocksdb::ConcurrentArena::AllocateAligned(unsigned long, unsigned long, rocksdb::Logger*)::{lambda()#1}>(unsigned long, bool, rocksdb::ConcurrentArena::AllocateAligned(unsigned long, unsigned long, rocksdb::Logger*)::{lambda()#1} const&) @ fbcode/internal_repo_rocksdb/repo/memory/concurrent_arena.h:145", "rocksdb::ConcurrentArena::AllocateAligned @ fbcode/internal_repo_rocksdb/repo/memory/concurrent_arena.h:63", "rocksdb::InlineSkipList<rocksdb::MemTableRep::KeyComparator const&>::AllocateNode @ fbcode/internal_repo_rocksdb/repo/memtable/inlineskiplist.h:868", "rocksdb::InlineSkipList<rocksdb::MemTableRep::KeyComparator const&>::AllocateKey @ fbcode/internal_repo_rocksdb/repo/memtable/inlineskiplist.h:855", "rocksdb::(anonymous namespace)::SkipListRep::Allocate @ ./fbcode/internal_repo_rocksdb/repo/memtable/skiplistrep.cc:36", "rocksdb::MemTable::Add @ ./fbcode/internal_repo_rocksdb/repo/db/memtable.cc:1157" ] } } ], "ops_seen": 279, "critical_instructions_seen": 38 } ``` Differential Revision: D107999835

Summary: Orchestration layer of the CPU corruption injector -- the glue between detection (#14852) and injection (#14858 ). One CPU-corruption injection says little on its own. What matters is the OUTCOME DISTRIBUTION over many injections -- how often a corruption is silently absorbed (NO_EFFECT), crashes the process (CRASH), is caught by an integrity check (CORRUPTION), or more importantly slips through as a silent data corruption (SDC) and the paths frequently leading to those outcomes. A trustworthy distribution needs a (somewhat) repeatable and reproducible harness as well as a db_stress configuration in which an injected corruption is both reachable and attributable to the chosen stress test op (i.e, write, foreground compaction or flush). Therefore this PR implements a runner that can launch N independent runs for a chosen op type (i.e, write, foreground compaction or flush). Each run picks where to inject, runs db_stress under gdb via the `injector,py` (#14858), and is classified into one outcome bucket (#14852). The runner has DB_STRESS_PRESET -- the pinned db_stress config that isolates a single injected corruption (single-threaded, integrity checks on, other fault injection off, auto-compaction off). The runner also does gdb preflight that fails fast when the build or gdb cannot support injection because, for example, the hard-coded `target_fn` has changed its name, provides parallel launching of many runs and one summary.json per campaign (the outcome distribution plus each run's record). The whole run set is reproducible from one logged base_seed (run i uses base_seed + i). **Test plan:** Build: `make DEBUG_LEVEL=0 EXTRA_CXXFLAGS="-g -fno-inline" db_stress` # 1. Preflight (`verify_injection_site`) catches a build that can't support injection Before doing any work, the runner has gdb confirm — on this exact binary — that every injection-site function resolves and gdb can read its source line. A good build logs: ``` INFO gdb check OK for op=compaction: rocksdb::CompactionJob::Run, rocksdb::CompactionIterator::NextFromInput, rocksdb::BlockBuilder::Add ``` A build that cannot support injection (functions renamed, fully inlined, or absent) fails fast with exit 2 before any run — forced here by pointing `--stress_cmd` at a non-db_stress binary: ``` $ python3 tools/cpu_corruption_injector/runner.py --op compaction --runs 1 --stress_cmd /bin/ls --report_dir /tmp/icc/preflight_demo ERROR gdb could not set a breakpoint on these functions in ls (renamed, fully inlined, or not in this build?): rocksdb::CompactionJob::Run, rocksdb::CompactionIterator::NextFromInput, rocksdb::BlockBuilder::Add Function "rocksdb::CompactionJob::Run" not defined. Function "rocksdb::CompactionIterator::NextFromInput" not defined. # exit code 2 ``` So a broken/inlined build is rejected up front instead of silently producing `NO_INJECTION` runs. # 2. Compaction op -- 100 runs **Runs' outcomes (`summary.json`):** | SDC | CORRUPTION | CRASH | NO_EFFECT | NO_INJECTION | ERROR | | --- | --- | --- | --- | --- | --- | | 9 | 5 | 6 | 79 | 1 | 0 | **Spread:** - target_fn x outcome: `NextFromInput` {SDC 9, CORRUPTION 3, CRASH 2, NO_EFFECT 34, NO_INJECTION 1}; `BlockBuilder::Add` {CORRUPTION 2, CRASH 4, NO_EFFECT 45}. - corruption_type x outcome: `bit_flip` {SDC 8, CORRUPTION 3, CRASH 6, NO_EFFECT 32}; `flag_flip` {SDC 1, CORRUPTION 2, NO_EFFECT 42}; `lane_bit_flip` {NO_EFFECT 5}. **Analysis:** all 9 SDCs land on the read/iterate path (`NextFromInput`); corrupting the output writer (`BlockBuilder::Add`) never produced an SDC (its blocks are checksummed -- corruption there is caught or inert). The 5 detected `CORRUPTION`s are compaction's key-order and record-count cross-checks firing (both CompactRange and CompactFiles origins appear), correctly bucketed by the fix. ### A representative compaction SDC: `run_00000` What we corrupted (`inject.json`): ```json {"op":"compaction","op_index":17,"entry_fn":"rocksdb::CompactionJob::Run","target_fn":"rocksdb::CompactionIterator::NextFromInput","injection_result":"injected","db_stress_crash_signal":null, "corruptions":[{"instruction":"mov %rdx,0x160(%r12)","register":"rdx","corruption_type":"bit_flip","before":"0x10","after":"0x18", "details":{"source":"rocksdb::CompactionIterator::NextFromInput @ db/compaction/compaction_iterator.cc:719"}}]} ``` The recorded silent corruption (`data_corruption.<tid>.json`): ```json {"kind":"wrong-value","cf":0,"key":70,"value_from_db":"010000000504070609080B0A0D0C0F0E070F105E78787878","value_from_expected":"010000000504070609080B0A0D0C0F0E","op_status":"Get: OK"} ``` **Walkthrough:** a single bit flip on `rdx` (`0x10 -> 0x18`, 16 -> 24) at `CompactionIterator::NextFromInput` (`compaction_iterator.cc:719`) -- `rdx` holds the value LENGTH stored into the iterator's record (offset `0x160`). The length reads 24 instead of 16, so compaction copies a value 8 bytes too long into the output SST, absorbing adjacent bytes. The internal key is untouched (`ParseInternalKey` passes); the over-long value is the single Slice fed to both the paranoid validator and the SST builder, so the file is self-consistent and every checksum agrees. On read-back `Get(key=70)` returns OK with the wrong bytes -- `value_from_db` is the expected value (`...0F0E`) **plus 8 trailing bytes** (`070F105E78787878`). Silent: read OK, all checks pass, the value visibly grew. `classify()` routes `kind=wrong-value` to the SDC bucket. ### A representative compaction CORRUPTION (detected): `run_00007` What we corrupted (`inject.json`): ```json {"op":"compaction","op_index":41,"entry_fn":"rocksdb::CompactionJob::Run","target_fn":"rocksdb::CompactionIterator::NextFromInput","injection_result":"injected","db_stress_crash_signal":"SIGABRT", "corruptions":[{"instruction":"mov (%rbx),%rdi","register":"rdi","corruption_type":"bit_flip","before":"0x7fffeee8c1c0","after":"0x7fffeee8c1c2", "details":{"source":"rocksdb::IterKey::SetKeyImpl @ ./db/dbformat.h:941","call_chain":["rocksdb::IterKey::SetKeyImpl @ ./db/dbformat.h:941","rocksdb::CompactionIterator::NextFromInput @ db/compaction/compaction_iterator.cc:781"]}}]} ``` The recorded detection (`data_corruption.<tid>.json`): ```json {"kind":"detected-corruption","cf":-1,"key":-1,"value_from_db":"","value_from_expected":"","op_status":"compactfiles: Corruption: Compaction sees out-of-order keys."} ``` **Walkthrough:** a bit flip on `rdi` (`...c1c0 -> ...c1c2`, a key pointer) at `IterKey::SetKeyImpl` (`dbformat.h:941`), reached from `NextFromInput`, mis-sets the iterator's key so the next emitted key is out of order. Compaction's key-order check catches it and returns `compactfiles: Corruption: Compaction sees out-of-order keys`. The op then takes `SIGABRT`, but `classify()` reads the recorded `data_corruption` result before the crash signal, so the run is correctly bucketed `CORRUPTION` (the bucketization fix; pre-fix this surfaced as `CRASH`). `classify()` routes `kind=detected-corruption` to the `CORRUPTION` bucket. # 3. Flush op -- 100 runs **Runs' outcomes (`summary.json`):** | SDC | CORRUPTION | CRASH | NO_EFFECT | NO_INJECTION | ERROR | | --- | --- | --- | --- | --- | --- | | 2 | 12 | 11 | 74 | 1 | 0 | **Spread:** - target_fn x outcome: `BlockBuilder::Add` {SDC 1, CORRUPTION 5, CRASH 5, NO_EFFECT 49}; `NextFromInput` {SDC 1, CORRUPTION 7, CRASH 6, NO_EFFECT 25, NO_INJECTION 1}. - corruption_type x outcome: `bit_flip` {SDC 2, CORRUPTION 9, CRASH 9, NO_EFFECT 32}; `flag_flip` {CORRUPTION 3, CRASH 2, NO_EFFECT 42}. **Analysis:** flush mirrors compaction's mechanisms (shared iterator/builder). The 2 SDCs are a value/key-pointer corruption that slips past the checksums; the 12 corruptions are caught by the flush-time key-order / key-size integrity checks. ### A representative flush SDC: `run_00027` What we corrupted (`inject.json`): ```json {"op":"flush","op_index":16,"entry_fn":"rocksdb::FlushJob::Run","target_fn":"rocksdb::BlockBuilder::Add","injection_result":"injected","db_stress_crash_signal":null, "corruptions":[{"instruction":"mov (%rdi),%rax","register":"rax","corruption_type":"bit_flip","before":"0x7fffef059400","after":"0x7fffef059440", "details":{"source":"rocksdb::Slice::data @ ./include/rocksdb/slice.h:58","call_chain":["rocksdb::Slice::data @ ./include/rocksdb/slice.h:58","rocksdb::BlockBuilder::AddWithLastKeyImpl @ table/block_based/block_builder.cc:351"]}}]} ``` The recorded silent corruption (`data_corruption.<tid>.json`): ```json {"kind":"lost","cf":0,"key":763,"value_from_db":"","value_from_expected":"010000000504070609080B0A0D0C0F0E","op_status":"Get: NotFound"} ``` **Walkthrough:** a bit flip on `rax` (a key/value data pointer, `...9400 -> ...9440`) at `Slice::data` (`slice.h:58`), reached from `BlockBuilder::AddWithLastKeyImpl` while the flush builds the output block, makes the builder read key bytes from the wrong address, so key 763's entry is written wrong and the key is dropped from the flushed SST. On read-back `Get(key=763)` returns `NotFound` for a committed key -- silent. `classify()` routes `kind=lost` to the SDC bucket. ### A representative flush CORRUPTION (detected): `run_00047` What we corrupted (`inject.json`): ```json {"op":"flush","op_index":7,"entry_fn":"rocksdb::FlushJob::Run","target_fn":"rocksdb::CompactionIterator::NextFromInput","injection_result":"injected","db_stress_crash_signal":"SIGABRT", "corruptions":[{"instruction":"cmp $0x7,%rax","register":"eflags","corruption_type":"flag_flip","before":"0x216","after":"0x256", "details":{"source":"rocksdb::ParseInternalKey @ ./db/dbformat.h:523","call_chain":["rocksdb::ParseInternalKey @ ./db/dbformat.h:523","rocksdb::CompactionIterator::NextFromInput @ db/compaction/compaction_iterator.cc:731"]}}]} ``` The recorded detection (`data_corruption.<tid>.json`): ```json {"kind":"detected-corruption","cf":-1,"key":-1,"value_from_db":"","value_from_expected":"","op_status":"flush: Corruption: Corrupted Key: Internal Key too small. Size=16. "} ``` **Walkthrough:** a flag flip (`eflags 0x216 -> 0x256`) on a `cmp $0x7,%rax` branch in `ParseInternalKey` (`dbformat.h:523`), reached from `NextFromInput`, makes the parser mis-judge the internal-key size, so the flush emits a malformed key and the key-size integrity check returns `flush: Corruption: Corrupted Key: Internal Key too small`. The op takes `SIGABRT`; `classify()` reads the recorded `data_corruption` before the signal and buckets `CORRUPTION` (bucketization fix). `classify()` routes `kind=detected-corruption` to the `CORRUPTION` bucket. # 4. Write op (`MemTable::Add`) -- two key spaces A write injection corrupts a single `MemTable::Add` (a Put/Delete/DeleteRange). The corruption is reachable and attributable, but whether it surfaces as a *silent* write SDC depends heavily on the key space. A silent write SDC needs the affected/mispositioned key to have other live versions to fall through to -- which only happens in a dense, multi-version memtable. We therefore run two write campaigns: the default `max_key=1000`, then a small `max_key=8`. The contrast is what motivates randomizing `max_key` (see PR <> for `--randomize_stress_flags`). ### 4a. Default `max_key=1000` -- 100 runs (no silent write SDC) **Runs' outcomes (`summary.json`):** | SDC | CORRUPTION | CRASH | NO_EFFECT | NO_INJECTION | ERROR | | --- | --- | --- | --- | --- | --- | | 0 | 31 | 13 | 56 | 0 | 0 | With a 1000-key space almost every write touches a distinct key, so a corrupted entry has no older live version to mask it: a value/key byte flip is caught at write by the per-key checksum (`VerifyEncodedEntry` -> `CORRUPTION`), and a structural flip tends to crash (`CRASH`) rather than silently mis-read. `ERROR=0`, `NO_INJECTION=0`. No write op silently corrupted data -- every reachable corruption was caught or crashed. ### 4b. Small `max_key=8` -- 100 runs (surfaces 2 silent write SDCs) **Runs' outcomes (`summary.json`):** | SDC | CORRUPTION | CRASH | NO_EFFECT | NO_INJECTION | ERROR | | --- | --- | --- | --- | --- | --- | | 2 | 33 | 8 | 57 | 0 | 0 | Shrinking the key space makes each key hold ~125 versions (`ops_per_thread` / `max_key`), so a misplaced entry can fall through to an older version of the *same* key and be returned silently -- the per-key checksum (bytes intact) and on-seek verify cannot see a pure link-position error. ### A representative write SDC: `run_00028` (skiplist misposition -> silent stale read, flush catches) What we corrupted (`inject.json`): ```json {"op":"write","op_index":317,"entry_fn":"rocksdb::MemTable::Add","target_fn":"rocksdb::MemTable::Add","injection_result":"injected","db_stress_crash_signal":null, "corruptions":[{"instruction":"cmp %rbx,-0xb8(%rbp)","register":"eflags","corruption_type":"flag_flip","before":"0x216","after":"0x217", "details":{"source":"rocksdb::MemTable::Add @ db/memtable.cc:1319", "call_chain":["rocksdb::MemTable::Add @ db/memtable.cc:1319"]}}]} ``` The recorded silent corruption (`data_corruption.<tid>.json`): ```json {"kind":"resurrected","cf":0,"key":1,"value_from_db":"110000001514171619181B1A1D1C1F1E0100030205040706","value_from_expected":"","op_status":"Get: OK"} ``` **Walkthrough:** a flag flip (CF, `eflags 0x216 -> 0x217`) on the `cmp` that produces `KeyIsAfterNode` inside `InlineSkipList::Insert` (`inlineskiplist.h:1253`; the `@ memtable.cc:1319` in the record is inlining line-drift) inverts the key comparison, so the Delete tombstone for key 1 is linked at the wrong position. The stored bytes and per-key checksum are intact, so neither the checksum nor on-seek verify sees anything wrong -- on read-back `Get(key=1)` returns OK with key 1's live value for a key that was Deleted (`kind=resurrected`, silent). A follow-up `Flush()` in unit test repro *does* catch it: the full-scan order check returns `Corruption: Out-of-order keys found in skiplist` -- caught only after the silent read, not during it. ### A representative write CORRUPTION (detected) `max_key=1000 or 8` : `run_00018` Where `run_00028`'s pure link-position error is invisible to the per-key checksum, this run shows a byte-level corruption that the checksum *catches* at write time. What we corrupted (`inject.json`): ```json {"op":"write","op_index":106,"entry_fn":"rocksdb::MemTable::Add","target_fn":"rocksdb::MemTable::Add","injection_result":"injected","db_stress_crash_signal":null, "corruptions":[{"instruction":"mov %rsi,(%rdi)","register":"rsi","corruption_type":"bit_flip","before":"0x7fffeec2a21c","after":"0x7fffeec2a25c", "details":{"source":"rocksdb::Slice::Slice @ ./include/rocksdb/slice.h:39", "call_chain":["rocksdb::Slice::Slice @ ./include/rocksdb/slice.h:39","rocksdb::GetVarint32 @ ./util/coding.h:280","rocksdb::MemTable::VerifyEncodedEntry @ db/memtable.cc:1102","rocksdb::MemTable::Add @ db/memtable.cc:1189"]}}]} ``` The recorded detection (`data_corruption.<tid>.json`): ```json {"kind":"detected-corruption","cf":-1,"key":-1,"value_from_db":"","value_from_expected":"","op_status":"put: Corruption: ProtectionInfo mismatch"} ``` **Walkthrough:** a bit flip on `rsi` (`0x7fffeec2a21c -> 0x7fffeec2a25c`) at `Slice::Slice` (`slice.h:39`) while `MemTable::Add` re-parses the just-encoded entry through `VerifyEncodedEntry` (`memtable.cc:1102`) corrupts the Slice the verifier reads, so the recomputed per-key protection info no longer matches and the put returns `Corruption: ProtectionInfo mismatch`. Differential Revision: D108367345

meta-codesync · 2026-06-18T22:17:51Z

@hx235 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D108367345.

github-actions · 2026-06-18T22:21:03Z

✅ clang-tidy: No findings on changed lines

Completed in 99.3s.

…ook#14852) Summary: Detection layer of the CPU corruption injector (facebook#14858). With `--verify_cpu_corruption_dir=<dir>`, db_stress reads back the full keyspace after every write/manual flush/manual compaction op and compares it to the expected-values model, classifying any mismatch by `kind`: `lost` / `resurrected` / `wrong-value` (silent data corruption) or `detected-corruption` (a status/checksum-caught error). Each finding is written to `<dir>/data_corruption.<tid>.json` ({kind, cf, key, value_from_db, value_from_expected, op_status}) and routed through db_stress's standard `VerificationAbort` for a clean exit-1. A startup guard requires `--threads=1` and all fault injection off so the read-back is single-writer and the only corruption present is the injected one Bonus: a minor refactoring into the surrounding error handling code in these ops **Test plan:** 1.Startup guard rejects misconfiguration: ``` --threads=2 -> exit 1: "--verify_cpu_corruption_dir requires --threads=1" --read_fault_one_in=5 -> exit 1: "requires all fault injection off" ``` 2.No false positive (clean CORE preset run, no injection): ``` $ db_stress --verify_cpu_corruption_dir=<dir> --threads=1 (full protections, all *_fault_one_in=0) ... exit 0; no data_corruption.<tid>.json produced; "Verification successful" ``` 3.Write-path cpu corruption injection (coming up, e.g, gdb flips a register inside MemTable::Add), then the immediate post-op read-back catches it. Real `<dir>/data_corruption.<tid>.json`: silent data corruption -- write returned OK but the key is gone on read-back: ``` {"kind":"lost","cf":0,"key":9814,"value_from_db":"","value_from_expected":"010000000504070609080B0A0D0C0F0E","op_status":"Get: NotFound"} ``` detected corruption -- read-back Get returns Corruption via the memtable per-key checksum: ``` {"kind":"detected-corruption","cf":0,"key":139,"value_from_db":"","value_from_expected":"","op_status":"Get: Corruption: Corrupted memtable entry, per key-value checksum verification failed." ``` 4.See PR facebook#14866 test plan's spread in the outcome for verification of detection Differential Revision: D107999834

Hui Xiao added 3 commits June 18, 2026 15:17

meta-cla Bot added the CLA Signed label Jun 18, 2026

meta-codesync Bot added the meta-exported label Jun 18, 2026

hx235 changed the title ~~CPU corruption injector: runner + db_stress preset~~ CPU corruption injector: runner + db_stress preset flags Jun 18, 2026

hx235 mentioned this pull request Jun 18, 2026

Verify CPU corruption after op via --verify_cpu_corruption_dir (#14852) #14852

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CPU corruption injector: runner + db_stress preset flags#14866

CPU corruption injector: runner + db_stress preset flags#14866
hx235 wants to merge 3 commits into
mainfrom
export-D108367345

hx235 commented Jun 18, 2026 •

edited

Loading

Uh oh!

meta-codesync Bot commented Jun 18, 2026

Uh oh!

github-actions Bot commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

hx235 commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. Preflight (verify_injection_site) catches a build that can't support injection

2. Compaction op -- 100 runs

A representative compaction SDC: run_00000

A representative compaction CORRUPTION (detected): run_00007

3. Flush op -- 100 runs

A representative flush SDC: run_00027

A representative flush CORRUPTION (detected): run_00047

4. Write op (MemTable::Add) -- two key spaces

4a. Default max_key=1000 -- 100 runs (no silent write SDC)

4b. Small max_key=8 -- 100 runs (surfaces 2 silent write SDCs)

A representative write SDC: run_00028 (skiplist misposition -> silent stale read, flush catches)

A representative write CORRUPTION (detected) max_key=1000 or 8 : run_00018

Uh oh!

meta-codesync Bot commented Jun 18, 2026

Uh oh!

github-actions Bot commented Jun 18, 2026

✅ clang-tidy: No findings on changed lines

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hx235 commented Jun 18, 2026 •

edited

Loading

1. Preflight (`verify_injection_site`) catches a build that can't support injection

A representative compaction SDC: `run_00000`

A representative compaction CORRUPTION (detected): `run_00007`

A representative flush SDC: `run_00027`

A representative flush CORRUPTION (detected): `run_00047`

4. Write op (`MemTable::Add`) -- two key spaces

4a. Default `max_key=1000` -- 100 runs (no silent write SDC)

4b. Small `max_key=8` -- 100 runs (surfaces 2 silent write SDCs)

A representative write SDC: `run_00028` (skiplist misposition -> silent stale read, flush catches)

A representative write CORRUPTION (detected) `max_key=1000 or 8` : `run_00018`