Reduce the number of payload copy between rust and python by fpacifici · Pull Request #299 · getsentry/streams

fpacifici · 2026-04-22T20:58:03Z

I ran a number of load tests to understand where the bottlenecks of streams were.
It turns out that a major component is the repeated copy of payload and headers between rust memory and python memory.

The current python message contains a number of properties like payload and headers.
Each of them delegates the execution to the rust object.
Problem: referencing the payload attribute on the python message means calling a rust function that takes the GIL and copies the content of the payload from rust memory into python memory. This happens every time python code references the payload. Which is tons of times when processing the message.
Worse: when we reference headers, the headers are unpacked from librdkafka, parsed and made into a python sequence taking the GIL. Every single time.

Adding a caching layer inside the message so that we copy payload, headers, timestamp and schema only once.
This should already show an improvement. Then I will try to exclude the headers from the picture entirely.

mwarkentin · 2026-04-22T21:17:56Z

Do you have any data from your tests that you can link here? Would be good to capture the artifact.

fpacifici · 2026-04-22T21:24:58Z

Do you have any data from your tests that you can link here? Would be good to capture the artifact.

I still have to compile data from the sandbox tests.
But:

This represents the throughput of a test case I ran in a sandbox environment.
The runs per second of arroyo is proportional to the messages per second we process as there is no async processing in the test case.
The high peaks are those without multiple payload copies. The low peaks are those where I executed all the current logic

This is where the test code is main...fpacifici/profile_filter

evanh · 2026-04-23T14:02:53Z


    def __repr__(self) -> str:
-        return f"PyMessage({self.inner.__repr__()})"
+        return (


Is there a reason not to call to_inner and then repr that inner object directly?

Wouldn't that trigger the expensive allocation?

Yes, that's the reason. Instantiating the rust object causes memory copy, that is what caused the issue.

markstory

Makes sense to me.

markstory · 2026-04-23T15:02:25Z


    def __repr__(self) -> str:
-        return f"PyMessage({self.inner.__repr__()})"
+        return (


Wouldn't that trigger the expensive allocation?

BUilt on top of #299 One of the main throughput issues we observed in the tests mentioned in #299 is that headers are really expensive to pass to and from rust. Most of the issue seems related with https://github.com/getsentry/streams/blob/main/sentry_streams/src/messages.rs#L47-L82. Specifically this code is executed every time python references headers. This is a test to confirm the impact by removing all the headers related logic. All messages are created with empty headers, the header fetching logic is never called. I left the support for headers in the rust code because the consumer to test relies on a header filter so I cannto remove it entirely. This change is suppsoed to be reverted after we perform a production test

Optimize message

c2b478f

fpacifici marked this pull request as ready for review April 22, 2026 21:10

fpacifici requested a review from a team as a code owner April 22, 2026 21:10

fpacifici mentioned this pull request Apr 22, 2026

Test skip processing headers #301

Merged

Revert the change in headers

54809b8

evanh approved these changes Apr 23, 2026

View reviewed changes

markstory approved these changes Apr 23, 2026

View reviewed changes

fpacifici merged commit 576c034 into main Apr 23, 2026
22 checks passed

sentry-release-bot Bot mentioned this pull request Apr 23, 2026

publish: getsentry/streams/sentry_streams@0.0.45 getsentry/publish#7924

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reduce the number of payload copy between rust and python#299

Reduce the number of payload copy between rust and python#299
fpacifici merged 2 commits intomainfrom
fpacifici/optimize_messages

fpacifici commented Apr 22, 2026 •

edited

Loading

Uh oh!

mwarkentin commented Apr 22, 2026

Uh oh!

fpacifici commented Apr 22, 2026

Uh oh!

evanh Apr 23, 2026

Uh oh!

markstory Apr 23, 2026

Uh oh!

fpacifici Apr 23, 2026

Uh oh!

markstory left a comment

Uh oh!

markstory Apr 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

fpacifici commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mwarkentin commented Apr 22, 2026

Uh oh!

fpacifici commented Apr 22, 2026

Uh oh!

evanh Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

markstory Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

fpacifici Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

markstory left a comment

Choose a reason for hiding this comment

Uh oh!

markstory Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fpacifici commented Apr 22, 2026 •

edited

Loading