Skip to content

Fix acting[] array out-of-bounds access in radostrace#87

Merged
pponnuvel merged 2 commits intotaodd:mainfrom
xtrusia:fix-acting-array-oob
Mar 31, 2026
Merged

Fix acting[] array out-of-bounds access in radostrace#87
pponnuvel merged 2 commits intotaodd:mainfrom
xtrusia:fix-acting-array-oob

Conversation

@xtrusia
Copy link
Copy Markdown
Contributor

@xtrusia xtrusia commented Mar 18, 2026

acting[] was originally declared as acting[8] in commit a224791, with matching loop bounds of i < 8. Later, commit ada1407 reduced the array to acting[6] to make room for cls_ops[3], but forgot to update the loop bounds in radostrace.cc and radostrace.bpf.c, causing OOB access.

@xtrusia xtrusia force-pushed the fix-acting-array-oob branch from 5e275f3 to ae72814 Compare March 28, 2026 03:50
Copy link
Copy Markdown
Owner

@taodd taodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me, some minor suggestions

xtrusia added 2 commits March 30, 2026 23:27
The acting[] array in client_op_v was defined with a fixed size of 6,
but the loop iterating over it used a hardcoded bound of 8, causing
out-of-bounds reads. Beyond fixing the mismatch, the array was too
small for real Ceph deployments.

Ceph caps the number of OSDs per PG at CEPH_PG_MAX_SIZE (16), defined
in src/include/rados.h. This applies to both replicated pools (max
size=10 enforced by OSDMonitor) and erasure-coded pools (k+m bounded
by CEPH_PG_MAX_SIZE). The Linux kernel client uses the same constant
to size its acting set arrays (struct ceph_osds in osdmap.h).

Changes:
- Define MAX_ACTING=16 macro in bpf_ceph_types.h, matching
  CEPH_PG_MAX_SIZE from Ceph's rados.h
- Use MAX_ACTING for acting[] array size and all loop bounds
- Move initialize_value() stack allocation to a .bss global
  (zero_val) to avoid BPF 512-byte stack limit, since client_op_v
  with acting[16] is ~434 bytes

Signed-off-by: Seyeong Kim <seyeong.kim@canonical.com>
bpf_ringbuf_reserve() was using sizeof(struct op_v) but the code
copies a struct client_op_v into the reserved buffer. With the
acting[] array expansion this became a detectable OOB write caught
by the BPF verifier.

Signed-off-by: Seyeong Kim <seyeong.kim@canonical.com>
@xtrusia xtrusia force-pushed the fix-acting-array-oob branch from fe5f6bf to 36026aa Compare March 30, 2026 23:28
@xtrusia xtrusia force-pushed the fix-acting-array-oob branch from 9a1fc27 to 36026aa Compare March 31, 2026 01:11
Copy link
Copy Markdown
Collaborator

@pponnuvel pponnuvel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @xtrusia!

@pponnuvel pponnuvel merged commit 18cd218 into taodd:main Mar 31, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants