Skip to content

Verify CPU corruption after op via --verify_cpu_corruption_dir (#14852)#14852

Open
hx235 wants to merge 1 commit into
facebook:mainfrom
hx235:export-D107999834
Open

Verify CPU corruption after op via --verify_cpu_corruption_dir (#14852)#14852
hx235 wants to merge 1 commit into
facebook:mainfrom
hx235:export-D107999834

Conversation

@hx235

@hx235 hx235 commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Summary:

Detection layer of the CPU corruption injector (#14858). With --verify_cpu_corruption_dir=<dir>, db_stress reads back the full keyspace after every write/manual flush/manual compaction op and compares it to the expected-values model, classifying any mismatch by kind: lost / resurrected / wrong-value (silent data corruption) or detected-corruption (a status/checksum-caught error). Each finding is written to <dir>/data_corruption.<tid>.json ({kind, cf, key, value_from_db, value_from_expected, op_status}) and routed through db_stress's standard VerificationAbort for a clean exit-1. A startup guard requires --threads=1 and all fault injection off so the read-back is single-writer and the only corruption present is the injected one

Bonus: a minor refactoring into the surrounding error handling code in these ops

Test plan:

1.Startup guard rejects misconfiguration:

--threads=2           -> exit 1: "--verify_cpu_corruption_dir requires --threads=1"
--read_fault_one_in=5 -> exit 1: "requires all fault injection off"

2.No false positive (clean CORE preset run, no injection):

$ db_stress --verify_cpu_corruption_dir=<dir> --threads=1 (full protections, all *_fault_one_in=0) ...
exit 0; no data_corruption.<tid>.json produced; "Verification successful"

3.Write-path cpu corruption injection (coming up, e.g, gdb flips a register inside MemTable::Add), then the immediate post-op read-back catches it. Real <dir>/data_corruption.<tid>.json:

silent data corruption -- write returned OK but the key is gone on read-back:

{"kind":"lost","cf":0,"key":9814,"value_from_db":"","value_from_expected":"010000000504070609080B0A0D0C0F0E","op_status":"Get: NotFound"}

detected corruption -- read-back Get returns Corruption via the memtable per-key checksum:

{"kind":"detected-corruption","cf":0,"key":139,"value_from_db":"","value_from_expected":"","op_status":"Get: Corruption: Corrupted memtable entry, per key-value checksum verification failed."

4.See PR #14866 test plan's spread in the outcome for verification of detection

Differential Revision: D107999834

@meta-cla meta-cla Bot added the CLA Signed label Jun 15, 2026
@meta-codesync

meta-codesync Bot commented Jun 15, 2026

Copy link
Copy Markdown

@hx235 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D107999834.

@github-actions

github-actions Bot commented Jun 15, 2026

Copy link
Copy Markdown

✅ clang-tidy: No findings on changed lines

Completed in 91.2s.

@github-actions

github-actions Bot commented Jun 15, 2026

Copy link
Copy Markdown

Codex Code Review - OBSOLETE

Superseded by a newer AI review. Expand to see the original review.

🟡 Codex Code Review

Auto-triggered after CI passed — reviewing commit 9aebab1


Codex review failed before producing findings.

WARNING: proceeding, even though we could not create PATH aliases: Refusing to create helper binaries under temporary dir "/tmp" (codex_home: AbsolutePathBuf("/tmp/codex-home"))
error: the argument '--base <BRANCH>' cannot be used with '[PROMPT]'

Usage: codex exec review --commit <SHA> --base <BRANCH> --title <TITLE> --model <MODEL> --config <key=value> --dangerously-bypass-approvals-and-sandbox --output-last-message <FILE> [PROMPT]

For more information, try '--help'.

ℹ️ About this response

Generated by Codex CLI.
Review methodology: claude_md/code_review.md

Limitations:

  • Codex may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /codex-review [context] — Request a code review
  • /codex-query <question> — Ask about the PR or codebase

@github-actions

github-actions Bot commented Jun 15, 2026

Copy link
Copy Markdown

Claude Code Review - OBSOLETE

Superseded by a newer AI review. Expand to see the original review.

✅ Claude Code Review

Auto-triggered after CI passed — reviewing commit 9aebab1


Summary

This PR adds a useful CPU corruption detection layer to db_stress via --verify_cpu_corruption_dir, performing full-keyspace readback after each write/flush/compaction op. The API changes (void->Status for TestCompactRange/TestCompactFiles) are clean. However, the verification logic has several correctness gaps that would cause crashes in debug builds and false results with certain configurations.

High-severity findings (2):

  • [db_stress_test_base.cc:952] ExpectedValue::Exists() called without checking pending state -- will assert-crash in debug builds if any key has PendingWrite/PendingDelete flags set.
  • [db_stress_test_base.cc:1007] Missing ReadOptions.timestamp when FLAGS_user_timestamp_size > 0 -- all readback Get() calls will fail or return wrong results with user-defined timestamps enabled.
Full review (click to expand)

Findings

🔴 HIGH

H1. ExpectedValue::Exists() assert crash -- db_stress_test_base.cc:952
  • Issue: ClassifyReadBack() calls expected_value.Exists() which contains assert(!PendingWrite() && !PendingDelete()) (expected_value.h:41). If any key in the full-keyspace scan has a pending flag, this crashes in debug builds. In release builds, the assert is compiled out and behavior is undefined (IsDeleted may return incorrect results for pending-delete keys).
  • Root cause: The PR does not check for pending state before calling Exists(). While --threads=1 ensures the just-completed op's expected state is committed, background operations or edge cases could leave other keys in pending state. More importantly, this diverges from the established pattern in VerifyOrSyncValue() (no_batched_ops_stress.cc:3268) which explicitly checks and handles pending state.
  • Suggested fix: Before calling Exists(), check expected_value.PendingWrite() || expected_value.PendingDelete() and either skip the key or sync the state from the DB, following the VerifyOrSyncValue() pattern.
H2. Missing user-defined timestamp support -- db_stress_test_base.cc:1007
  • Issue: MaybeVerifyCpuCorruption() creates ReadOptions read_opts with verify_checksums = true but does NOT set read_opts.timestamp when FLAGS_user_timestamp_size > 0. Every db_->Get() call will either fail or return incorrect results when UDT is enabled.
  • Root cause: The canonical verification pattern (no_batched_ops_stress.cc:40-44) sets options.timestamp to GetNowNanos(). The PR omits this.
  • Suggested fix: Add timestamp handling:
    std::string ts_str;
    Slice ts;
    if (FLAGS_user_timestamp_size > 0) {
      ts_str = GetNowNanos();
      ts = ts_str;
      read_opts.timestamp = &ts;
    }

🟡 MEDIUM

M1. No guard for non-state-tracked stress tests -- db_stress_test_base.cc:998
  • Issue: MaybeVerifyCpuCorruption() uses shared->Get(cf, key) which relies on expected state tracking. Three of four stress test subclasses set IsStateTracked() = false: BatchedOpsStressTest, MultiOpsTxnsStressTest, CfConsistencyStressTest. These tests don't call PreparePut/PrepareDelete, so expected state is never populated -- every key appears deleted, producing false "lost" reports for every existing key.
  • Root cause: The method is in the base class but only works with NonBatchedOpsStressTest.
  • Suggested fix: Add early return: if (!IsStateTracked()) return; or validate at startup that verify_cpu_corruption_dir is incompatible with test_batches_snapshots, test_cf_consistency, and test_multi_ops_txns.
M2. Incomplete fault injection validation -- db_stress_tool.cc:196
  • Issue: Startup validation checks most *_fault_one_in flags but misses secondary_cache_fault_one_in (declared in db_stress_gflags.cc:1301). Secondary cache faults could inject errors during readback Get() calls, tainting results.
  • Suggested fix: Add FLAGS_secondary_cache_fault_one_in != 0 to the validation block.
M3. Insufficient error status handling in ClassifyReadBack -- db_stress_test_base.cc:949
  • Issue: After checking IsCorruption(), the code asserts read_status.ok() || read_status.IsNotFound(). Even with fault injection off, other statuses are possible: IOError (disk issues), Incomplete (prefix bloom), TimedOut. The assert would crash in debug; in release, the status would be silently misclassified.
  • Suggested fix: Handle unexpected statuses gracefully -- either skip the key with a warning or treat non-OK/non-NotFound as "detected-corruption" with the actual status.
M4. Missing verification for ingest operations -- db_stress_test_base.cc (OperateDb)
  • Issue: TestIngestExternalFile and ingest_wbwi_one_in operations modify the DB but are NOT followed by MaybeVerifyCpuCorruption(). These operations can significantly change DB state, creating a gap in corruption detection coverage.
  • Suggested fix: Either add MaybeVerifyCpuCorruption() after ingest operations or document the exclusion.

🟢 LOW / NIT

L1. Code duplication with VerifyDb -- db_stress_test_base.cc:1003
  • Issue: The PR's full-keyspace readback duplicates logic from VerifyOrSyncValue() (no_batched_ops_stress.cc:3260). The TODO comment at line 1003 acknowledges this. The canonical verifier handles more edge cases (pending state, timestamps, wide columns, secondary DB). Divergence over time is a maintenance risk.
  • Suggested fix: Follow through on the TODO to extract a shared per-key verification helper.
L2. Wide column validation gap -- db_stress_test_base.cc:1013
  • Issue: When FLAGS_use_put_entity_one_in > 0, the PR uses db_->Get() which returns the default column value. This is functionally correct for value comparison, but does not validate wide column consistency (non-default columns could be corrupted without detection). The canonical VerifyDb calls VerifyWideColumns().
  • Suggested fix: Consider using GetEntity() with VerifyWideColumns() for comprehensive coverage when PutEntity is in use.
L3. fopen without directory existence check -- db_stress_test_base.cc:977
  • Issue: WriteDataCorruption() calls fopen() on a path under FLAGS_verify_cpu_corruption_dir without verifying the directory exists. If the directory doesn't exist, fopen fails and the error is only printed to stderr -- the corruption finding is lost.
  • Suggested fix: Validate directory existence at startup or create it if missing.
L4. Performance documentation -- db_stress_gflags.cc:133
  • Issue: The flag documentation doesn't mention the extreme performance cost of full-keyspace readback after every op (potentially millions of Get calls per write). Users should understand this is designed for single-op CPU-fault-injection debugging, not general stress testing.
  • Suggested fix: Add a note about performance implications to the flag description.

Cross-Component Analysis

Context Compatible? Notes
NonBatchedOpsStressTest YES Primary target, state tracking works
BatchedOpsStressTest NO IsStateTracked()=false, expected state not populated
MultiOpsTxnsStressTest NO IsStateTracked()=false, different data model
CfConsistencyStressTest NO IsStateTracked()=false
User-defined timestamps NO ReadOptions.timestamp not set (H2)
Wide columns / PutEntity PARTIAL Default column correct, non-default not validated
Merge operations YES db_->Get() returns merged result correctly
Background compaction YES Compaction preserves data visibility; compaction filter is already mutually exclusive with most features

Assumption stress-test results:

  • "Single writer": Valid with --threads=1. Background compaction doesn't change logical data visibility (only reorganizes files). Compaction filters are already excluded by existing flag validation.
  • "Fault injection off": Mostly valid but secondary_cache_fault_one_in is missed (M2).
  • "Only OK/NotFound statuses": False -- IOError and other statuses possible even without injection (M3).

Positive Observations

  • The void-to-Status refactor for TestCompactRange and TestCompactFiles is clean and correctly propagates through the only override (MultiOpsTxnsStressTest).
  • The DataCorruption struct with Slice fields has correct lifetime management -- Slices reference caller-owned buffers (db_value string and expected_scratch array) that outlive the struct usage.
  • JSON output uses std::quoted() for op_status field escaping and hex encoding for values -- safe against injection.
  • The early-return pattern in MaybeVerifyCpuCorruption() (checking FLAGS_verify_cpu_corruption_dir.empty()) has negligible overhead on the normal (disabled) path.
  • Placing verification before re-enabling fault injection (see "put" case comments) is correct and well-documented.

ℹ️ About this response

Generated by Claude Code.
Review methodology: claude_md/code_review.md

Limitations:

  • Claude may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /claude-review [context] — Request a code review
  • /claude-query <question> — Ask about the PR or codebase

@meta-codesync meta-codesync Bot changed the title Verify CPU corruption after each op via --verify_cpu_corruption_dir Verify CPU corruption after each op via --verify_cpu_corruption_dir (#14852) Jun 16, 2026
@hx235 hx235 force-pushed the export-D107999834 branch from 9aebab1 to 9e499d3 Compare June 16, 2026 05:03
hx235 pushed a commit to hx235/rocksdb that referenced this pull request Jun 16, 2026
…acebook#14852)

Summary:

Detection layer of the CPU corruption injector (coming up). With `--verify_cpu_corruption_dir=<dir>`, db_stress reads back the full keyspace after every write/flush/compaction op and compares it to the expected-values model, classifying any mismatch by `kind`: `lost` / `resurrected` / `wrong-value` (silent data corruption) or `detected-corruption` (a status/checksum-caught error). Each finding is written to `<dir>/data_corruption.<tid>.json` ({kind, cf, key, value_from_db, value_from_expected, op_status}) and routed through db_stress's standard `VerificationAbort` for a clean exit-1. A startup guard requires `--threads=1` and all fault injection off so the read-back is single-writer and the only corruption present is the injected one

Differential Revision: D107999834
@hx235

hx235 commented Jun 16, 2026

Copy link
Copy Markdown
Contributor Author

Fixed meaningful AI feedback

hx235 pushed a commit to hx235/rocksdb that referenced this pull request Jun 16, 2026
…acebook#14852)

Summary:

Detection layer of the CPU corruption injector (coming up). With `--verify_cpu_corruption_dir=<dir>`, db_stress reads back the full keyspace after every write/flush/compaction op and compares it to the expected-values model, classifying any mismatch by `kind`: `lost` / `resurrected` / `wrong-value` (silent data corruption) or `detected-corruption` (a status/checksum-caught error). Each finding is written to `<dir>/data_corruption.<tid>.json` ({kind, cf, key, value_from_db, value_from_expected, op_status}) and routed through db_stress's standard `VerificationAbort` for a clean exit-1. A startup guard requires `--threads=1` and all fault injection off so the read-back is single-writer and the only corruption present is the injected one

**Test plan:**
1.Startup guard rejects misconfiguration:
```
--threads=2           -> exit 1: "--verify_cpu_corruption_dir requires --threads=1"
--read_fault_one_in=5 -> exit 1: "requires all fault injection off"
```
2.No false positive (clean CORE preset run, no injection):
```
$ db_stress --verify_cpu_corruption_dir=<dir> --threads=1 (full protections, all *_fault_one_in=0) ...
exit 0; no data_corruption.<tid>.json produced; "Verification successful"
```
3.Write-path cpu corruption injection (coming up, e.g, gdb flips a register inside MemTable::Add), then the immediate post-op read-back catches it. Real `<dir>/data_corruption.<tid>.json`:

silent data corruption -- write returned OK but the key is gone on read-back:
```
{"kind":"lost","cf":0,"key":9814,"value_from_db":"","value_from_expected":"010000000504070609080B0A0D0C0F0E","op_status":"Get: NotFound"}
```
detected corruption -- read-back Get returns Corruption via the memtable per-key checksum:
```
{"kind":"detected-corruption","cf":0,"key":139,"value_from_db":"","value_from_expected":"","op_status":"Get: Corruption: Corrupted memtable entry, per key-value checksum verification failed."
```

Differential Revision: D107999834
@hx235 hx235 force-pushed the export-D107999834 branch from 9e499d3 to aafd57d Compare June 16, 2026 05:21
@hx235

hx235 commented Jun 16, 2026

Copy link
Copy Markdown
Contributor Author

Fix format

@github-actions

Copy link
Copy Markdown

🟡 Codex Code Review

Auto-triggered after CI passed — reviewing commit aafd57d


Codex review failed before producing findings.

WARNING: proceeding, even though we could not create PATH aliases: Refusing to create helper binaries under temporary dir "/tmp" (codex_home: AbsolutePathBuf("/tmp/codex-home"))
error: the argument '--base <BRANCH>' cannot be used with '[PROMPT]'

Usage: codex exec review --commit <SHA> --base <BRANCH> --title <TITLE> --model <MODEL> --config <key=value> --dangerously-bypass-approvals-and-sandbox --output-last-message <FILE> [PROMPT]

For more information, try '--help'.

ℹ️ About this response

Generated by Codex CLI.
Review methodology: claude_md/code_review.md

Limitations:

  • Codex may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /codex-review [context] — Request a code review
  • /codex-query <question> — Ask about the PR or codebase

@github-actions

Copy link
Copy Markdown

✅ Claude Code Review

Auto-triggered after CI passed — reviewing commit aafd57d


Summary

Well-structured, focused PR that adds CPU corruption detection to db_stress with appropriate safety guards. The design is sound: full-keyspace read-back after each op under single-writer constraints. The void-to-Status refactoring of TestCompactRange/TestCompactFiles is clean and correctly updates the only override (MultiOpsTxnsStressTest).

High-severity findings (0):

No high-severity findings.

Full review (click to expand)

Findings

🔴 HIGH

None.

🟡 MEDIUM

M1. Missing startup guard for --num_dbs > 1 with --verify_cpu_corruption_dir -- db_stress_tool/db_stress_tool.cc
  • Issue: The startup guard validates --threads=1 but does not check FLAGS_num_dbs. With --num_dbs=2, two StressTest instances run on separate threads, each with its own DB and expected state. The output files use thread->tid which resets per DB instance (thread IDs are 0-based per DB). With two DB instances both having tid=0, the output file data_corruption.0.json would be written by both instances to the same directory, causing a race.
  • Root cause: FLAGS_verify_cpu_corruption_dir is a global (per-process) flag, but thread IDs are per-DB.
  • Suggested fix: Either reject --num_dbs > 1 in the startup guard, or include the DB index in the output filename (e.g., data_corruption.<db_index>.<tid>.json).
M2. No snapshot used during full-keyspace read-back -- db_stress_test_base.cc:1044-1059
  • Issue: MaybeVerifyCpuCorruption issues individual db_->Get() calls without a snapshot. Between successive Get() calls, background compaction/flush could run. There is a theoretical window where TTL expiration (if FLAGS_ttl > 0) could cause a key to disappear between the write and a later read-back Get(), leading to a false "lost" classification.
  • Root cause: The canonical VerifyDb also reads without a snapshot in its Get path, so this follows existing convention. However, VerifyDb runs at quiescent points, while this runs mid-operation-loop.
  • Suggested fix: Consider rejecting --ttl > 0 in the startup guard, or use a snapshot for the read-back.
M3. ClassifyReadBack does not handle PendingWrite/PendingDelete state -- db_stress_test_base.cc:949-981
  • Issue: Unlike the canonical VerifyOrSyncValue (no_batched_ops_stress.cc:3301) which checks for pending state and syncs rather than asserting, ClassifyReadBack calls expected_value.Exists() which asserts !PendingWrite() && !PendingDelete(). Under --threads=1, this is safe because TestPut/TestDelete/TestDeleteRange commit/rollback before returning. However, this relies on a non-locally-obvious invariant.
  • Suggested fix: Add a comment at the Exists() call explaining WHY it's safe (the commit/rollback-before-return guarantee under --threads=1).

🟢 LOW / NIT

L1. Output directory not validated at startup -- db_stress_tool/db_stress_tool.cc
  • Issue: The startup guard does not verify that FLAGS_verify_cpu_corruption_dir exists or is writable. If the directory doesn't exist, fopen returns NULL, stderr message prints, but VerificationAbort still triggers. JSON file is supplementary.
  • Suggested fix: Consider validating the directory at startup or documenting it must exist.
L2. Full-keyspace scan after no-op compaction -- db_stress_test_base.cc:1549
  • Issue: TestCompactFiles returns Status::OK() when no compaction is performed. Full read-back still runs. Wasteful but harmless for a debugging tool.

Cross-Component Analysis

Context Assumptions hold? Action needed?
--num_dbs > 1 File collision risk (M1) Guard or fix filename
--enable_compaction_filter Safe - only removes already-deleted keys None
--use_txn + commit_bypass_memtable Safe - committed before read-back None
--ttl > 0 Potential false positive (M2) Consider guarding
User-defined timestamps Handled correctly None
BlobDB Safe - Get() handles blob indirection None
Background compaction/flush Safe - doesn't modify expected state None

Positive Observations

  1. Clean refactoring: The void -> Status change for TestCompactRange/TestCompactFiles is minimal and correctly updates the only override.
  2. Proper fault injection window: MaybeVerifyCpuCorruption is called BEFORE re-enabling fault injection, ensuring clean read-back.
  3. Defensive unexpected-status handling: ImmediateExit(1) for non-OK/non-NotFound/non-Corruption statuses is appropriate.
  4. Good startup guards: Catches test mode incompatibilities and all fault injection flags.
  5. Tech debt acknowledgment: The TODO noting duplication with VerifyDb and the refactoring plan is good practice.
  6. Correct user-timestamp handling: Read-back correctly sets a read timestamp when FLAGS_user_timestamp_size > 0.

ℹ️ About this response

Generated by Claude Code.
Review methodology: claude_md/code_review.md

Limitations:

  • Claude may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /claude-review [context] — Request a code review
  • /claude-query <question> — Ask about the PR or codebase

meta-codesync Bot pushed a commit that referenced this pull request Jun 16, 2026
…14852)

Summary:

Detection layer of the CPU corruption injector (coming up). With `--verify_cpu_corruption_dir=<dir>`, db_stress reads back the full keyspace after every write/flush/compaction op and compares it to the expected-values model, classifying any mismatch by `kind`: `lost` / `resurrected` / `wrong-value` (silent data corruption) or `detected-corruption` (a status/checksum-caught error). Each finding is written to `<dir>/data_corruption.<tid>.json` ({kind, cf, key, value_from_db, value_from_expected, op_status}) and routed through db_stress's standard `VerificationAbort` for a clean exit-1. A startup guard requires `--threads=1` and all fault injection off so the read-back is single-writer and the only corruption present is the injected one

**Test plan:**
1.Startup guard rejects misconfiguration:
```
--threads=2           -> exit 1: "--verify_cpu_corruption_dir requires --threads=1"
--read_fault_one_in=5 -> exit 1: "requires all fault injection off"
```
2.No false positive (clean CORE preset run, no injection):
```
$ db_stress --verify_cpu_corruption_dir=<dir> --threads=1 (full protections, all *_fault_one_in=0) ...
exit 0; no data_corruption.<tid>.json produced; "Verification successful"
```
3.Write-path cpu corruption injection (coming up, e.g, gdb flips a register inside MemTable::Add), then the immediate post-op read-back catches it. Real `<dir>/data_corruption.<tid>.json`:

silent data corruption -- write returned OK but the key is gone on read-back:
```
{"kind":"lost","cf":0,"key":9814,"value_from_db":"","value_from_expected":"010000000504070609080B0A0D0C0F0E","op_status":"Get: NotFound"}
```
detected corruption -- read-back Get returns Corruption via the memtable per-key checksum:
```
{"kind":"detected-corruption","cf":0,"key":139,"value_from_db":"","value_from_expected":"","op_status":"Get: Corruption: Corrupted memtable entry, per key-value checksum verification failed."
```

Differential Revision: D107999834
meta-codesync Bot pushed a commit that referenced this pull request Jun 16, 2026
)

Summary:

Injection layer of the CPU corruption injector (tools/cpu_corruption_injector/injector.py), runs inside gdb and corrupt a register by bit flip in exactly one db_stress op (i.e, write, foreground compaction and flush) per stress test run. Detection is at db_stress (#14852); orchestration is coming up.

How one run works 
- The orchestration layer, coming up, randomly picks which op instance (so corruption lands at different points in the LSM's life) and which target_fn per run (so it has a reasonable number of instructions to step under a reasonable time limit); injector.py picks which instruction within target_fn.
- Attach: gdb starts with injector.py's parameters passed via -iex and the db_stress command after --args, so db_stress runs unmodified. Example:
```
gdb --batch --nx \
  -iex "py import sys; sys.argv=['injector.py','--op','write','--op_index','42','--entry_fn','rocksdb::MemTable::Add','--target_fn','rocksdb::MemTable::Add','--corruptions_per_op','1','--seed','7','--dir','<rundir>']" \
  -x tools/cpu_corruption_injector/injector.py \
  --args <db_stress> --threads=1 --verify_cpu_corruption_dir=<rundir>  ...
```
- Reach the op: entry_fn is called exactly once per stress test run's op so the op_index-th op is its op_index-th call. The orchestration layer picks op_index . `injector_navigate.py` breaks on entry_fn and set a gdb ignore-count of op_index-1 to fast-forward to op_index-th one. 

- Warm up: `injector_critical_instruction.py` will choose "critical instruction" (those that move key/value bytes with general-purpose or vector registers or set a branch flag) uniformly within the chosen `target_fn` (within `entry_fn`) by the orchestration layer. In order to do that, it needs to approximate how many such instructions within `target_fn`. Hence we have this warm-up phase. It single-steps the first call of target_fn to count and pick the critical instruction index, then corrupt that index at a later call. 

- Corrupt: on a later call of target_fn, `injector_critical_instruction.py` single-step to the m-th critical instruction and bit-flip the register through `injector_register_corruption.py`. The way to corrupt register depends on what instruction it is. 

- Record: `injector_telemetry.py` provides telemetry to capture the corruption for later analysis.


**Test plan:**
1. Isolated tests (real gdb-captured x/i fixtures): test_inject_critical_instruction 6/6

2. E2E test on navigation, inject, telemetry will be done in the later orchestration PR. Below is inject.json from such run
```
{
  "injection_result": "injected",
  "db_stress_crash_signal": null,
  "op": "write",
  "op_index": 279,
  "entry_fn": "rocksdb::MemTable::Add",
  "target_fn": "rocksdb::MemTable::Add",
  "critical_instruction_index": 37,
  "corruptions": [
    {
      "instruction": "mov    %rsi,0x8c8(%rbx)",
      "register": "rsi",
      "corruption_type": "bit_flip",
      "before": "0x7fffee4c64d8",
      "after": "0x7fffee4c64c8",
      "details": {
        "source": "rocksdb::Arena::AllocateAligned @ ./fbcode/internal_repo_rocksdb/repo/memory/arena.cc:135",
        "call_chain": [
          "rocksdb::Arena::AllocateAligned @ ./fbcode/internal_repo_rocksdb/repo/memory/arena.cc:135",
          "rocksdb::ConcurrentArena::AllocateAligned(unsigned long, unsigned long, rocksdb::Logger*)::{lambda()#1}::operator()() const @ fbcode/internal_repo_rocksdb/repo/memory/concurrent_arena.h:65",
          "rocksdb::ConcurrentArena::AllocateImpl<rocksdb::ConcurrentArena::AllocateAligned(unsigned long, unsigned long, rocksdb::Logger*)::{lambda()#1}>(unsigned long, bool, rocksdb::ConcurrentArena::AllocateAligned(unsigned long, unsigned long, rocksdb::Logger*)::{lambda()#1} const&) @ fbcode/internal_repo_rocksdb/repo/memory/concurrent_arena.h:145",
          "rocksdb::ConcurrentArena::AllocateAligned @ fbcode/internal_repo_rocksdb/repo/memory/concurrent_arena.h:63",
          "rocksdb::InlineSkipList<rocksdb::MemTableRep::KeyComparator const&>::AllocateNode @ fbcode/internal_repo_rocksdb/repo/memtable/inlineskiplist.h:868",
          "rocksdb::InlineSkipList<rocksdb::MemTableRep::KeyComparator const&>::AllocateKey @ fbcode/internal_repo_rocksdb/repo/memtable/inlineskiplist.h:855",
          "rocksdb::(anonymous namespace)::SkipListRep::Allocate @ ./fbcode/internal_repo_rocksdb/repo/memtable/skiplistrep.cc:36",
          "rocksdb::MemTable::Add @ ./fbcode/internal_repo_rocksdb/repo/db/memtable.cc:1157"
        ]
      }
    }
  ],
  "ops_seen": 279,
  "critical_instructions_seen": 38
}
```

Differential Revision: D107999835
@hx235 hx235 changed the title Verify CPU corruption after each op via --verify_cpu_corruption_dir (#14852) Verify CPU corruption after op via --verify_cpu_corruption_dir (#14852) Jun 16, 2026
@hx235 hx235 requested a review from pdillinger June 16, 2026 20:54
hx235 pushed a commit to hx235/rocksdb that referenced this pull request Jun 18, 2026
…ook#14852)

Summary:

Detection layer of the CPU corruption injector (facebook#14858). With `--verify_cpu_corruption_dir=<dir>`, db_stress reads back the full keyspace after every write/manual flush/manual compaction op and compares it to the expected-values model, classifying any mismatch by `kind`: `lost` / `resurrected` / `wrong-value` (silent data corruption) or `detected-corruption` (a status/checksum-caught error). Each finding is written to `<dir>/data_corruption.<tid>.json` ({kind, cf, key, value_from_db, value_from_expected, op_status}) and routed through db_stress's standard `VerificationAbort` for a clean exit-1. A startup guard requires `--threads=1` and all fault injection off so the read-back is single-writer and the only corruption present is the injected one

Bonus: a minor refactoring into the surrounding error handling code in these ops 

**Test plan:**

1.Startup guard rejects misconfiguration:
```
--threads=2           -> exit 1: "--verify_cpu_corruption_dir requires --threads=1"
--read_fault_one_in=5 -> exit 1: "requires all fault injection off"
```
2.No false positive (clean CORE preset run, no injection):
```
$ db_stress --verify_cpu_corruption_dir=<dir> --threads=1 (full protections, all *_fault_one_in=0) ...
exit 0; no data_corruption.<tid>.json produced; "Verification successful"
```
3.Write-path cpu corruption injection (coming up, e.g, gdb flips a register inside MemTable::Add), then the immediate post-op read-back catches it. Real `<dir>/data_corruption.<tid>.json`:

silent data corruption -- write returned OK but the key is gone on read-back:
```
{"kind":"lost","cf":0,"key":9814,"value_from_db":"","value_from_expected":"010000000504070609080B0A0D0C0F0E","op_status":"Get: NotFound"}
```
detected corruption -- read-back Get returns Corruption via the memtable per-key checksum:
```
{"kind":"detected-corruption","cf":0,"key":139,"value_from_db":"","value_from_expected":"","op_status":"Get: Corruption: Corrupted memtable entry, per key-value checksum verification failed."
```

4.See PR [todo]'s spread in the outcome for verification of detection

Differential Revision: D107999834
@hx235 hx235 force-pushed the export-D107999834 branch from aafd57d to 5cad4a7 Compare June 18, 2026 22:13
meta-codesync Bot pushed a commit that referenced this pull request Jun 18, 2026
Summary:

Detection layer of the CPU corruption injector (#14858). With `--verify_cpu_corruption_dir=<dir>`, db_stress reads back the full keyspace after every write/manual flush/manual compaction op and compares it to the expected-values model, classifying any mismatch by `kind`: `lost` / `resurrected` / `wrong-value` (silent data corruption) or `detected-corruption` (a status/checksum-caught error). Each finding is written to `<dir>/data_corruption.<tid>.json` ({kind, cf, key, value_from_db, value_from_expected, op_status}) and routed through db_stress's standard `VerificationAbort` for a clean exit-1. A startup guard requires `--threads=1` and all fault injection off so the read-back is single-writer and the only corruption present is the injected one

Bonus: a minor refactoring into the surrounding error handling code in these ops 

**Test plan:**

1.Startup guard rejects misconfiguration:
```
--threads=2           -> exit 1: "--verify_cpu_corruption_dir requires --threads=1"
--read_fault_one_in=5 -> exit 1: "requires all fault injection off"
```
2.No false positive (clean CORE preset run, no injection):
```
$ db_stress --verify_cpu_corruption_dir=<dir> --threads=1 (full protections, all *_fault_one_in=0) ...
exit 0; no data_corruption.<tid>.json produced; "Verification successful"
```
3.Write-path cpu corruption injection (coming up, e.g, gdb flips a register inside MemTable::Add), then the immediate post-op read-back catches it. Real `<dir>/data_corruption.<tid>.json`:

silent data corruption -- write returned OK but the key is gone on read-back:
```
{"kind":"lost","cf":0,"key":9814,"value_from_db":"","value_from_expected":"010000000504070609080B0A0D0C0F0E","op_status":"Get: NotFound"}
```
detected corruption -- read-back Get returns Corruption via the memtable per-key checksum:
```
{"kind":"detected-corruption","cf":0,"key":139,"value_from_db":"","value_from_expected":"","op_status":"Get: Corruption: Corrupted memtable entry, per key-value checksum verification failed."
```

4.See PR [todo]'s spread in the outcome for verification of detection

Differential Revision: D107999834
meta-codesync Bot pushed a commit that referenced this pull request Jun 18, 2026
)

Summary:

This PR is the injection layer of the CPU corruption injector, runs inside gdb and randomly corrupts a register by bit flip in exactly one db_stress op (i.e, write, foreground compaction and flush) per stress test run. Detection layer is at db_stress (#14852); orchestration layer is coming up.

__How one run works__ 
- The orchestration layer, coming up, randomly picks which stress test `op` instance (so corruption can land at different points in the LSM shape journey) and which `target_fn` of that `op` (so to cap instructions to step under a reasonable limit; `injector.py` in this PR randomly picks which instruction within the `target_fn` to inject (so corruption can land at different points of a `target_fn`).
- Attach: gdb starts with injector.py's parameters passed via -iex and the db_stress command after --args, so db_stress runs unmodified. Example:
```
gdb --batch --nx \
  -iex "py import sys; sys.argv=['injector.py','--op','write','--op_index','42','--entry_fn','rocksdb::MemTable::Add','--target_fn','rocksdb::MemTable::Add','--corruptions_per_op','1','--seed','7','--dir','<rundir>']" \
  -x tools/cpu_corruption_injector/injector.py \
  --args <db_stress> --threads=1 --verify_cpu_corruption_dir=<rundir>  ...
```
- Navigate: The orchestration layer will pick op_index. `entry_fn` is called exactly once per stress test run's op so the op_index-th op is its op_index-th call. `injector_navigate.py` breaks on `entry_fn` and set a gdb ignore-count of op_index-1 to fast-forward to op_index-th one. It also breaks at the first `target_fn` within that `entry_fn`. 

- Warm up: `injector_critical_instruction.py` will choose "critical instruction" (those that move key/value bytes with general-purpose or vector registers or set a branch flag) uniformly within the chosen `target_fn` by the orchestration layer. In order to do that, it needs to approximate how many such instructions within the `target_fn`. Hence we have this warm-up phase. It single-steps the instruction within the first encoutering of `target_fn` to count and draw the critical instruction index, then corrupt that index at a later call. 

- Corrupt: on a later call of `target_fn`, `injector_critical_instruction.py` single-step to the m-th critical instruction and bit-flip the register through `injector_register_corruption.py`. The way to corrupt register depends on what instruction it is. If the current call of `target_fn`'s m-th instruction is not a critical instruction, we will try next `target_fn` till running out of `target_fn`.

- Record: `injector_telemetry.py` provides telemetry to capture the corruption for later analysis.


**Test plan:**
1. Isolated tests (real gdb-captured x/i fixtures): test_inject_critical_instruction

2. E2E test on navigation, inject, telemetry will be done in the later orchestration PR. Below is inject.json from such run
```
{
  "injection_result": "injected",
  "db_stress_crash_signal": null,
  "op": "write",
  "op_index": 279,
  "entry_fn": "rocksdb::MemTable::Add",
  "target_fn": "rocksdb::MemTable::Add",
  "critical_instruction_index": 37,
  "corruptions": [
    {
      "instruction": "mov    %rsi,0x8c8(%rbx)",
      "register": "rsi",
      "corruption_type": "bit_flip",
      "before": "0x7fffee4c64d8",
      "after": "0x7fffee4c64c8",
      "details": {
        "source": "rocksdb::Arena::AllocateAligned @ ./fbcode/internal_repo_rocksdb/repo/memory/arena.cc:135",
        "call_chain": [
          "rocksdb::Arena::AllocateAligned @ ./fbcode/internal_repo_rocksdb/repo/memory/arena.cc:135",
          "rocksdb::ConcurrentArena::AllocateAligned(unsigned long, unsigned long, rocksdb::Logger*)::{lambda()#1}::operator()() const @ fbcode/internal_repo_rocksdb/repo/memory/concurrent_arena.h:65",
          "rocksdb::ConcurrentArena::AllocateImpl<rocksdb::ConcurrentArena::AllocateAligned(unsigned long, unsigned long, rocksdb::Logger*)::{lambda()#1}>(unsigned long, bool, rocksdb::ConcurrentArena::AllocateAligned(unsigned long, unsigned long, rocksdb::Logger*)::{lambda()#1} const&) @ fbcode/internal_repo_rocksdb/repo/memory/concurrent_arena.h:145",
          "rocksdb::ConcurrentArena::AllocateAligned @ fbcode/internal_repo_rocksdb/repo/memory/concurrent_arena.h:63",
          "rocksdb::InlineSkipList<rocksdb::MemTableRep::KeyComparator const&>::AllocateNode @ fbcode/internal_repo_rocksdb/repo/memtable/inlineskiplist.h:868",
          "rocksdb::InlineSkipList<rocksdb::MemTableRep::KeyComparator const&>::AllocateKey @ fbcode/internal_repo_rocksdb/repo/memtable/inlineskiplist.h:855",
          "rocksdb::(anonymous namespace)::SkipListRep::Allocate @ ./fbcode/internal_repo_rocksdb/repo/memtable/skiplistrep.cc:36",
          "rocksdb::MemTable::Add @ ./fbcode/internal_repo_rocksdb/repo/db/memtable.cc:1157"
        ]
      }
    }
  ],
  "ops_seen": 279,
  "critical_instructions_seen": 38
}
```

Differential Revision: D107999835
meta-codesync Bot pushed a commit that referenced this pull request Jun 18, 2026
Summary:

Detection layer of the CPU corruption injector (#14858). With `--verify_cpu_corruption_dir=<dir>`, db_stress reads back the full keyspace after every write/manual flush/manual compaction op and compares it to the expected-values model, classifying any mismatch by `kind`: `lost` / `resurrected` / `wrong-value` (silent data corruption) or `detected-corruption` (a status/checksum-caught error). Each finding is written to `<dir>/data_corruption.<tid>.json` ({kind, cf, key, value_from_db, value_from_expected, op_status}) and routed through db_stress's standard `VerificationAbort` for a clean exit-1. A startup guard requires `--threads=1` and all fault injection off so the read-back is single-writer and the only corruption present is the injected one

Bonus: a minor refactoring into the surrounding error handling code in these ops 

**Test plan:**

1.Startup guard rejects misconfiguration:
```
--threads=2           -> exit 1: "--verify_cpu_corruption_dir requires --threads=1"
--read_fault_one_in=5 -> exit 1: "requires all fault injection off"
```
2.No false positive (clean CORE preset run, no injection):
```
$ db_stress --verify_cpu_corruption_dir=<dir> --threads=1 (full protections, all *_fault_one_in=0) ...
exit 0; no data_corruption.<tid>.json produced; "Verification successful"
```
3.Write-path cpu corruption injection (coming up, e.g, gdb flips a register inside MemTable::Add), then the immediate post-op read-back catches it. Real `<dir>/data_corruption.<tid>.json`:

silent data corruption -- write returned OK but the key is gone on read-back:
```
{"kind":"lost","cf":0,"key":9814,"value_from_db":"","value_from_expected":"010000000504070609080B0A0D0C0F0E","op_status":"Get: NotFound"}
```
detected corruption -- read-back Get returns Corruption via the memtable per-key checksum:
```
{"kind":"detected-corruption","cf":0,"key":139,"value_from_db":"","value_from_expected":"","op_status":"Get: Corruption: Corrupted memtable entry, per key-value checksum verification failed."
```

4.See PR [todo]'s spread in the outcome for verification of detection

Differential Revision: D107999834
meta-codesync Bot pushed a commit that referenced this pull request Jun 18, 2026
)

Summary:

This PR is the injection layer of the CPU corruption injector, runs inside gdb and randomly corrupts a register by bit flip in exactly one db_stress op (i.e, write, foreground compaction and flush) per stress test run. Detection layer is at db_stress (#14852); orchestration layer is coming up.

__How one run works__ 
- The orchestration layer, coming up, randomly picks which stress test `op` instance (so corruption can land at different points in the LSM shape journey) and which `target_fn` of that `op` (so to cap instructions to step under a reasonable limit; `injector.py` in this PR randomly picks which instruction within the `target_fn` to inject (so corruption can land at different points of a `target_fn`).
- Attach: gdb starts with injector.py's parameters passed via -iex and the db_stress command after --args, so db_stress runs unmodified. Example:
```
gdb --batch --nx \
  -iex "py import sys; sys.argv=['injector.py','--op','write','--op_index','42','--entry_fn','rocksdb::MemTable::Add','--target_fn','rocksdb::MemTable::Add','--corruptions_per_op','1','--seed','7','--dir','<rundir>']" \
  -x tools/cpu_corruption_injector/injector.py \
  --args <db_stress> --threads=1 --verify_cpu_corruption_dir=<rundir>  ...
```
- Navigate: The orchestration layer will pick op_index. `entry_fn` is called exactly once per stress test run's op so the op_index-th op is its op_index-th call. `injector_navigate.py` breaks on `entry_fn` and set a gdb ignore-count of op_index-1 to fast-forward to op_index-th one. It also breaks at the first `target_fn` within that `entry_fn`. 

- Warm up: `injector_critical_instruction.py` will choose "critical instruction" (those that move key/value bytes with general-purpose or vector registers or set a branch flag) uniformly within the chosen `target_fn` by the orchestration layer. In order to do that, it needs to approximate how many such instructions within the `target_fn`. Hence we have this warm-up phase. It single-steps the instruction within the first encoutering of `target_fn` to count and draw the critical instruction index, then corrupt that index at a later call. 

- Corrupt: on a later call of `target_fn`, `injector_critical_instruction.py` single-step to the m-th critical instruction and bit-flip the register through `injector_register_corruption.py`. The way to corrupt register depends on what instruction it is. If the current call of `target_fn`'s m-th instruction is not a critical instruction, we will try next `target_fn` till running out of `target_fn`.

- Record: `injector_telemetry.py` provides telemetry to capture the corruption for later analysis.


**Test plan:**
1. Isolated tests (real gdb-captured x/i fixtures): test_inject_critical_instruction

2. E2E test on navigation, inject, telemetry will be done in the later orchestration PR. Below is inject.json from such run
```
{
  "injection_result": "injected",
  "db_stress_crash_signal": null,
  "op": "write",
  "op_index": 279,
  "entry_fn": "rocksdb::MemTable::Add",
  "target_fn": "rocksdb::MemTable::Add",
  "critical_instruction_index": 37,
  "corruptions": [
    {
      "instruction": "mov    %rsi,0x8c8(%rbx)",
      "register": "rsi",
      "corruption_type": "bit_flip",
      "before": "0x7fffee4c64d8",
      "after": "0x7fffee4c64c8",
      "details": {
        "source": "rocksdb::Arena::AllocateAligned @ ./fbcode/internal_repo_rocksdb/repo/memory/arena.cc:135",
        "call_chain": [
          "rocksdb::Arena::AllocateAligned @ ./fbcode/internal_repo_rocksdb/repo/memory/arena.cc:135",
          "rocksdb::ConcurrentArena::AllocateAligned(unsigned long, unsigned long, rocksdb::Logger*)::{lambda()#1}::operator()() const @ fbcode/internal_repo_rocksdb/repo/memory/concurrent_arena.h:65",
          "rocksdb::ConcurrentArena::AllocateImpl<rocksdb::ConcurrentArena::AllocateAligned(unsigned long, unsigned long, rocksdb::Logger*)::{lambda()#1}>(unsigned long, bool, rocksdb::ConcurrentArena::AllocateAligned(unsigned long, unsigned long, rocksdb::Logger*)::{lambda()#1} const&) @ fbcode/internal_repo_rocksdb/repo/memory/concurrent_arena.h:145",
          "rocksdb::ConcurrentArena::AllocateAligned @ fbcode/internal_repo_rocksdb/repo/memory/concurrent_arena.h:63",
          "rocksdb::InlineSkipList<rocksdb::MemTableRep::KeyComparator const&>::AllocateNode @ fbcode/internal_repo_rocksdb/repo/memtable/inlineskiplist.h:868",
          "rocksdb::InlineSkipList<rocksdb::MemTableRep::KeyComparator const&>::AllocateKey @ fbcode/internal_repo_rocksdb/repo/memtable/inlineskiplist.h:855",
          "rocksdb::(anonymous namespace)::SkipListRep::Allocate @ ./fbcode/internal_repo_rocksdb/repo/memtable/skiplistrep.cc:36",
          "rocksdb::MemTable::Add @ ./fbcode/internal_repo_rocksdb/repo/db/memtable.cc:1157"
        ]
      }
    }
  ],
  "ops_seen": 279,
  "critical_instructions_seen": 38
}
```

Differential Revision: D107999835
…ook#14852)

Summary:

Detection layer of the CPU corruption injector (facebook#14858). With `--verify_cpu_corruption_dir=<dir>`, db_stress reads back the full keyspace after every write/manual flush/manual compaction op and compares it to the expected-values model, classifying any mismatch by `kind`: `lost` / `resurrected` / `wrong-value` (silent data corruption) or `detected-corruption` (a status/checksum-caught error). Each finding is written to `<dir>/data_corruption.<tid>.json` ({kind, cf, key, value_from_db, value_from_expected, op_status}) and routed through db_stress's standard `VerificationAbort` for a clean exit-1. A startup guard requires `--threads=1` and all fault injection off so the read-back is single-writer and the only corruption present is the injected one

Bonus: a minor refactoring into the surrounding error handling code in these ops 

**Test plan:**

1.Startup guard rejects misconfiguration:
```
--threads=2           -> exit 1: "--verify_cpu_corruption_dir requires --threads=1"
--read_fault_one_in=5 -> exit 1: "requires all fault injection off"
```
2.No false positive (clean CORE preset run, no injection):
```
$ db_stress --verify_cpu_corruption_dir=<dir> --threads=1 (full protections, all *_fault_one_in=0) ...
exit 0; no data_corruption.<tid>.json produced; "Verification successful"
```
3.Write-path cpu corruption injection (coming up, e.g, gdb flips a register inside MemTable::Add), then the immediate post-op read-back catches it. Real `<dir>/data_corruption.<tid>.json`:

silent data corruption -- write returned OK but the key is gone on read-back:
```
{"kind":"lost","cf":0,"key":9814,"value_from_db":"","value_from_expected":"010000000504070609080B0A0D0C0F0E","op_status":"Get: NotFound"}
```
detected corruption -- read-back Get returns Corruption via the memtable per-key checksum:
```
{"kind":"detected-corruption","cf":0,"key":139,"value_from_db":"","value_from_expected":"","op_status":"Get: Corruption: Corrupted memtable entry, per key-value checksum verification failed."
```

4.See PR facebook#14866 test plan's spread in the outcome for verification of detection

Differential Revision: D107999834
@hx235 hx235 force-pushed the export-D107999834 branch from 5cad4a7 to dbc0912 Compare June 18, 2026 22:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant