Skip to content

Reduce the number of payload copy between rust and python#299

Merged
fpacifici merged 2 commits intomainfrom
fpacifici/optimize_messages
Apr 23, 2026
Merged

Reduce the number of payload copy between rust and python#299
fpacifici merged 2 commits intomainfrom
fpacifici/optimize_messages

Conversation

@fpacifici
Copy link
Copy Markdown
Collaborator

@fpacifici fpacifici commented Apr 22, 2026

I ran a number of load tests to understand where the bottlenecks of streams were.
It turns out that a major component is the repeated copy of payload and headers between rust memory and python memory.

The current python message contains a number of properties like payload and headers.
Each of them delegates the execution to the rust object.
Problem: referencing the payload attribute on the python message means calling a rust function that takes the GIL and copies the content of the payload from rust memory into python memory. This happens every time python code references the payload. Which is tons of times when processing the message.
Worse: when we reference headers, the headers are unpacked from librdkafka, parsed and made into a python sequence taking the GIL. Every single time.

Adding a caching layer inside the message so that we copy payload, headers, timestamp and schema only once.
This should already show an improvement. Then I will try to exclude the headers from the picture entirely.

@fpacifici fpacifici marked this pull request as ready for review April 22, 2026 21:10
@fpacifici fpacifici requested a review from a team as a code owner April 22, 2026 21:10
@mwarkentin
Copy link
Copy Markdown
Member

Do you have any data from your tests that you can link here? Would be good to capture the artifact.

@fpacifici
Copy link
Copy Markdown
Collaborator Author

Do you have any data from your tests that you can link here? Would be good to capture the artifact.

I still have to compile data from the sandbox tests.
But:
Screenshot 2026-04-22 at 2 22 44 PM

This represents the throughput of a test case I ran in a sandbox environment.
The runs per second of arroyo is proportional to the messages per second we process as there is no async processing in the test case.
The high peaks are those without multiple payload copies. The low peaks are those where I executed all the current logic

This is where the test code is main...fpacifici/profile_filter


def __repr__(self) -> str:
return f"PyMessage({self.inner.__repr__()})"
return (
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason not to call to_inner and then repr that inner object directly?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't that trigger the expensive allocation?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's the reason. Instantiating the rust object causes memory copy, that is what caused the issue.

Copy link
Copy Markdown
Member

@markstory markstory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me.


def __repr__(self) -> str:
return f"PyMessage({self.inner.__repr__()})"
return (
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't that trigger the expensive allocation?

@fpacifici fpacifici merged commit 576c034 into main Apr 23, 2026
22 checks passed
fpacifici added a commit that referenced this pull request Apr 23, 2026
BUilt on top of #299

One of the main throughput issues we observed in the tests mentioned in
#299 is that headers are 
really expensive to pass to and from rust. Most of the issue seems
related with
https://github.com/getsentry/streams/blob/main/sentry_streams/src/messages.rs#L47-L82.
Specifically this code is executed every time python references
headers.

This is a test to confirm the impact by removing all the headers related
logic.
All messages are created with empty headers, the header fetching logic 
is never called.
I left the support for headers in the rust code because the consumer
to test relies on a header filter so I cannto remove it entirely.

This change is suppsoed to be reverted after we perform a production
test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants