Reduce the number of payload copy between rust and python#299
Reduce the number of payload copy between rust and python#299
Conversation
|
Do you have any data from your tests that you can link here? Would be good to capture the artifact. |
I still have to compile data from the sandbox tests. This represents the throughput of a test case I ran in a sandbox environment. This is where the test code is main...fpacifici/profile_filter |
|
|
||
| def __repr__(self) -> str: | ||
| return f"PyMessage({self.inner.__repr__()})" | ||
| return ( |
There was a problem hiding this comment.
Is there a reason not to call to_inner and then repr that inner object directly?
There was a problem hiding this comment.
Wouldn't that trigger the expensive allocation?
There was a problem hiding this comment.
Yes, that's the reason. Instantiating the rust object causes memory copy, that is what caused the issue.
|
|
||
| def __repr__(self) -> str: | ||
| return f"PyMessage({self.inner.__repr__()})" | ||
| return ( |
There was a problem hiding this comment.
Wouldn't that trigger the expensive allocation?
BUilt on top of #299 One of the main throughput issues we observed in the tests mentioned in #299 is that headers are really expensive to pass to and from rust. Most of the issue seems related with https://github.com/getsentry/streams/blob/main/sentry_streams/src/messages.rs#L47-L82. Specifically this code is executed every time python references headers. This is a test to confirm the impact by removing all the headers related logic. All messages are created with empty headers, the header fetching logic is never called. I left the support for headers in the rust code because the consumer to test relies on a header filter so I cannto remove it entirely. This change is suppsoed to be reverted after we perform a production test

I ran a number of load tests to understand where the bottlenecks of
streamswere.It turns out that a major component is the repeated copy of payload and headers between rust memory and python memory.
The current python message contains a number of properties like payload and headers.
Each of them delegates the execution to the rust object.
Problem: referencing the
payloadattribute on the python message means calling a rust function that takes the GIL and copies the content of the payload from rust memory into python memory. This happens every time python code references the payload. Which is tons of times when processing the message.Worse: when we reference headers, the headers are unpacked from librdkafka, parsed and made into a python sequence taking the GIL. Every single time.
Adding a caching layer inside the message so that we copy payload, headers, timestamp and schema only once.
This should already show an improvement. Then I will try to exclude the headers from the picture entirely.