Skip to content

Check malloc return values in C API (#13099)#14879

Open
sg20180546 wants to merge 1 commit into
facebook:mainfrom
sg20180546:fix-malloc-null-checks-c-api
Open

Check malloc return values in C API (#13099)#14879
sg20180546 wants to merge 1 commit into
facebook:mainfrom
sg20180546:fix-malloc-null-checks-c-api

Conversation

@sg20180546

Copy link
Copy Markdown
Contributor

Summary

Fixes part of #13099.

Several C API functions in db/c.cc allocate buffers with malloc() and immediately dereference the result in the following loop without checking for failure. On a low-memory system malloc() can return NULL, which leads to a null-pointer dereference and a crash.

This PR adds NULL checks to the three call sites that already take an errptr parameter, so the allocation failure can be propagated cleanly. The handling mirrors the existing pattern already used by rocksdb_open_and_compact (report Status::MemoryLimit through errptr and return nullptr).

Call sites fixed

  • rocksdb_list_column_families — sets *lencfs = 0, reports the error and returns nullptr.
  • rocksdb_create_column_families — the underlying column families have already been created at this point, so on allocation failure it deletes the created handles to avoid leaking them before returning.
  • rocksdb_load_latest_options — has two allocations; if either fails it frees the other buffer (free(nullptr) is a safe no-op), resets all out-params, reports the error and returns.

Scope

This intentionally covers only the call sites that have an errptr to propagate the failure, keeping the change small and consistent with the existing error-handling pattern. Other unchecked malloc sites mentioned in the issue (e.g. functions without an errptr, the CopyString helper, and cache/lru_cache.cc) can be addressed in follow-ups since they need different handling.

Test plan

  • make format-auto / build_tools/format-diff.sh -c — clean.
  • make check-sources — passes.
  • db/c.o compiles.

The new branches are only taken under allocation failure, which is not exercised by the existing functional tests; the change preserves existing behavior on the success path.

Several C API functions allocate buffers with malloc() and immediately
dereference the result without checking for failure. On a low-memory
system malloc() can return NULL, causing a null-pointer dereference and
crash in the following loop.

Add NULL checks to the three call sites that have an errptr to propagate
the failure, mirroring the existing pattern used by
rocksdb_open_and_compact:

- rocksdb_list_column_families
- rocksdb_create_column_families (also frees the already-created column
  family handles to avoid leaking them on allocation failure)
- rocksdb_load_latest_options (frees the first buffer if the second
  allocation fails)

On failure these now report Status::MemoryLimit through errptr and return
nullptr (with the out-params reset) instead of crashing.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@meta-cla meta-cla Bot added the CLA Signed label Jun 24, 2026
@github-actions

Copy link
Copy Markdown

✅ clang-tidy: No findings on changed lines

Completed in 52.0s.

@github-actions

Copy link
Copy Markdown

🟡 Codex Code Review

Auto-triggered after CI passed — reviewing commit f377050


Codex review failed before producing findings.

WARNING: proceeding, even though we could not create PATH aliases: Refusing to create helper binaries under temporary dir "/tmp" (codex_home: AbsolutePathBuf("/tmp/codex-home"))
error: the argument '--base <BRANCH>' cannot be used with '[PROMPT]'

Usage: codex exec review --commit <SHA> --base <BRANCH> --title <TITLE> --model <MODEL> --config <key=value> --dangerously-bypass-approvals-and-sandbox --output-last-message <FILE> [PROMPT]

For more information, try '--help'.

ℹ️ About this response

Generated by Codex CLI.
Review methodology: claude_md/code_review.md

Limitations:

  • Codex may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /codex-review [context] — Request a code review
  • /codex-query <question> — Ask about the PR or codebase

@github-actions

Copy link
Copy Markdown

✅ Claude Code Review

Auto-triggered after CI passed — reviewing commit f377050


Summary

Clean, well-scoped defensive fix that adds NULL checks after three malloc() call sites in the C API, preventing null-pointer dereference crashes under OOM. The approach correctly follows the existing Status::MemoryLimit + errptr pattern established by rocksdb_open_and_compact. One medium-severity concern regarding orphaned column families in the DB on the rocksdb_create_column_families error path, and a few minor observations.

High-severity findings (0):
No high-severity findings.

Full review (click to expand)

Findings

🔴 HIGH

None.

🟡 MEDIUM

M1. Orphaned column families in rocksdb_create_column_familiesdb/c.cc:1812
  • Issue: When malloc fails after CreateColumnFamilies succeeds, the error path does delete handles[i] which calls ColumnFamilyHandleImpl::~ColumnFamilyHandleImpl(). The destructor (db/column_family.cc:56) calls cfd_->UnrefAndTryDelete() — this unrefs the ColumnFamilyData but does not drop the column family from the DB. The column families persist in the DB but the caller receives nullptr and has no handles to access or drop them.
  • Root cause: delete handle != DropColumnFamily. The CFs remain in the DB metadata and will appear in subsequent ListColumnFamilies calls, consuming resources with no way for the caller to know they were created.
  • Suggested fix: This is arguably the least-bad option (alternative is a NULL-deref crash), but it should be documented with a comment explaining the trade-off. Consider whether db->rep->DropColumnFamily(handles[i]) should be called before delete handles[i] to fully clean up. A comment explaining the trade-off would suffice.
M2. SaveError itself may fail under OOM — db/c.cc:661
  • Issue: SaveError calls strdup() internally (line 661). Under OOM, strdup will also likely fail, setting *errptr to NULL — indistinguishable from "no error" for callers that only check errptr.
  • Root cause: Pre-existing limitation of the SaveError pattern; not introduced by this PR.
  • Suggested fix: Not actionable in this PR's scope. The functions still return nullptr / set *lencfs = 0, so callers checking the return value will still detect failure.

🟢 LOW / NIT

L1. malloc(0) edge case in rocksdb_list_column_familiesdb/c.cc:1763
  • Issue: If ListColumnFamilies fails, fams is empty, malloc(0) may return NULL, triggering a spurious MemoryLimit error that replaces the real error already saved by SaveError. Pre-existing issue (function doesn't early-return on ListColumnFamilies failure).
L2. Unchecked strdup in success-path loops — db/c.cc:1765, db/c.cc:3646
  • Issue: strdup() calls in the for-loops are also unchecked. Acknowledged as out-of-scope in the PR description.

Cross-Component Analysis

All three functions are cold-path admin operations (DB setup, column family management, options loading). No hot-path performance impact. The change is backwards compatible — callers that didn't check for NULL were already crashing; now they get a clean error. All existing test callers (c_test.c) check err before dereferencing return values, so the new error paths are safe.

Positive Observations

  • Minimal, well-scoped change following established Status::MemoryLimit pattern
  • Correct cross-freeing logic in rocksdb_load_latest_options (free(NULL) is safe)
  • Thorough PR description accurately describing trade-offs
  • *lencfs = 0 set before return prevents callers from iterating over null arrays

ℹ️ About this response

Generated by Claude Code.
Review methodology: claude_md/code_review.md

Limitations:

  • Claude may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /claude-review [context] — Request a code review
  • /claude-query <question> — Ask about the PR or codebase

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant