Skip to content

fix: deep copy eBPF data for each syscall sub-event to prevent memory issues#770

Open
yugal07 wants to merge 1 commit intokubescape:mainfrom
yugal07:fix-deepcopy
Open

fix: deep copy eBPF data for each syscall sub-event to prevent memory issues#770
yugal07 wants to merge 1 commit intokubescape:mainfrom
yugal07:fix-deepcopy

Conversation

@yugal07
Copy link
Copy Markdown

@yugal07 yugal07 commented Apr 2, 2026

copying eBPF data for each sub-event

Overview

When the syscall tracer gets an event, it fans it out into multiple sub-events (one per syscall).
The problem was all those sub-events were sharing the same eBPF data pointer — there was even
a WARNING comment in the code about it. So when one sub-event released the data, the rest were
left pointing at freed memory.

To patch over this, Release() was being skipped for syscall events entirely in the worker pool,
which just traded a use-after-free for a memory leak.

The fix: each sub-event now gets its own deep copy via event.Datasource.DeepCopy(event.Data),
and the original is released after all sub-events are created. With that in place, the special-case
skip of Release() in the worker pool isn't needed anymore, so that's removed too.

Files Changes

  • pkg/containerwatcher/v2/tracers/syscall.go — deep copy data per sub-event, release original after
  • pkg/containerwatcher/v2/container_watcher.go — remove syscall-specific Release() skip

How to Test

Run the existing tracer tests:
go test ./pkg/containerwatcher/v2/tracers/..

Summary by CodeRabbit

  • Bug Fixes
    • Improved event resource management by ensuring all events are properly cleaned up after processing, regardless of type, enhancing overall system stability and reliability.

  copying eBPF data for each sub-event

Signed-off-by: yugal07 <yashsadhwani544@gmail.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 2, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5c73ccbf-0302-4331-9f6f-d7bdb241722e

📥 Commits

Reviewing files that changed from the base of the PR and between 9a09420 and 30cc6bf.

📒 Files selected for processing (2)
  • pkg/containerwatcher/v2/container_watcher.go
  • pkg/containerwatcher/v2/tracers/syscall.go

📝 Walkthrough

Walkthrough

The PR refactors event lifecycle management: the syscall tracer now creates independent deep copies of event data for each sub-event and releases the original after processing, while the container watcher unconditionally releases all events after handling, including syscall events, removing prior conditional skip logic.

Changes

Cohort / File(s) Summary
Container Worker Event Release
pkg/containerwatcher/v2/container_watcher.go
Removed utils dependency and made event release unconditional; enrichedEvent.Event.Release() is now called for all event types instead of skipping syscall events, simplifying control flow.
Syscall Tracer Data Isolation
pkg/containerwatcher/v2/tracers/syscall.go
Implemented per-sub-event data isolation using DeepCopy() and added event.Release() call to clean up original data after processing all sub-events.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related issues

Poem

🐰 A tale of careful cleanup and copy,
Where syscalls dance with data quite poppy,
Deep copies made, releases flow true,
No leaks shall slip through — old or new!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: deep copying eBPF data for syscall sub-events to prevent memory issues, which directly matches the core objective of fixing memory safety.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@yugal07
Copy link
Copy Markdown
Author

yugal07 commented Apr 2, 2026

@matthyx Please Take a look. Thanks

if enrichedEvent.Event.GetEventType() != utils.SyscallEventType {
enrichedEvent.Event.Release() // at this time we should not need the event anymore
}
enrichedEvent.Event.Release() // at this time we should not need the event anymore
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we sure we can drop the condition here?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In callback:
Data: event.Datasource.DeepCopy(event.Data), // each sub-event owns its own copy
...
event.Release() // original released here, after all sub-events are created

Each sub-event now holds an independent deep copy. When the worker pool calls:
enrichedEvent.Event.Release()
...it's releasing that sub-event's own copy, not shared memory. No other sub-event is affected.

So yes, dropping the condition is correct and intentional. The condition was only there to paper over the
shared-pointer problem. Now that every sub-event owns its data, the worker pool can safely call Release()
uniformly for all event types.

What do you think?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah right

@matthyx matthyx requested a review from YakirOren April 2, 2026 13:24
@matthyx
Copy link
Copy Markdown
Contributor

matthyx commented Apr 2, 2026

@YakirOren can you check your benchmarks to see the impact?

@YakirOren
Copy link
Copy Markdown
Contributor

heap-with-fix.prof.gz - quay.io/yakiro/node-agent:yugal-fix-deepcopy
heap.prof.gz - latest image

both ran for 10m while a script that generates syscall anomalies was running
test-syscalls.sh

Metric Baseline With Fix Delta Change
inuse_space 12,025 kB 8,902 kB -3,123 kB -26.0%
inuse_objects 107,028 82,701 -24,327 -22.7%
alloc_space 6,826 MB 7,593 MB +767 MB +11.2%
alloc_objects 77.5M 85.5M +8.0M +10.3%

@yugal07
Copy link
Copy Markdown
Author

yugal07 commented Apr 2, 2026

heap-with-fix.prof.gz - quay.io/yakiro/node-agent:yugal-fix-deepcopy heap.prof.gz - latest image

both ran for 10m while a script that generates syscall anomalies was running test-syscalls.sh
Metric Baseline With Fix Delta Change
inuse_space 12,025 kB 8,902 kB -3,123 kB -26.0%
inuse_objects 107,028 82,701 -24,327 -22.7%
alloc_space 6,826 MB 7,593 MB +767 MB +11.2%
alloc_objects 77.5M 85.5M +8.0M +10.3%

the shell script seems light, In heavy production environments, I think the inuse space might drop more than 26 percent, and even more pronounced since the leaked objects would accumulate faster.

Are these benchmarks good? There seems to be a tradeoff happening

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

3 participants