feat(rust_streams): PipelineStats and Batch/Filter metrics by fpacifici · Pull Request #309 · getsentry/streams

fpacifici · 2026-05-01T06:37:54Z

Summary

We are not collecting metrics from Rust Steps. This adds the equivalent abstractions we have in python.
The next step will be to replace the python ones with the rust ones.

Adds a Rust pipeline_stats module aligned with Python PipelineStats (stats.py): in-memory aggregation of per-step exec counts, error counts, and max duration per flush window (10s), then emission through the metrics crate only (streams.pipeline.input.messages, errors, duration with a step label). No Arroyo types on the recording path.

Changes

RuntimeOperator: step_name on Batch and Filter; Python adapter passes pipeline step names.
BatchStep: step_exec per ingested streaming row; step_timing on batch flush (b.flush()).
Filter: step_exec / step_timing / step_error around the Python predicate; PredicateFilter closure no longer calls Python input_metrics / output_metrics to avoid double counting.
Tests: Rust unit tests for flush behavior and throttle (local metrics recorder).

Made with Cursor

Introduce pipeline_stats.rs mirroring Python PipelineStats: buffered per-step exec, error, and max timing, flushed every 10s via the metrics crate (streams.pipeline.*) with a step label. No Arroyo dependency for recording. Plumb step_name through RuntimeOperator Batch/Filter. BatchStep records step_exec per row and step_timing on flush; Filter records around the Python predicate. Remove duplicate Python metrics from PredicateFilter. Rust tests cover aggregation and throttling; use scripts/rust-envvars with cargo test per Makefile. Made-with: Cursor

Put step_name before optional Batch parameters so rust_streams.pyi is valid Python (required args before defaults). Apply cargo fmt to satisfy the lint job. Made-with: Cursor

markstory · 2026-05-01T14:35:01Z

+impl PipelineStats {
+    fn with_flush_interval(flush_interval: Duration) -> Self {
+        Self {
+            inner: Mutex::new(Inner::new()),


Why do we need the additional buffering? Couldn't the metrics backend take care of this?

We were used to have that and it turned out to be a major performance issue (addressed here in the python version #305).

In a nutshell:

multiple stats are collected at every step of the pipeline, including steps that are trivial, this makes such collection extremely high throughput.

The metrics backend does additional work (see the profile attached to the linked PR): like tags normalization as it has to be general. It does not know there is one tag only, that ended up being the culprit of the overhead.

Stats are much more lightweight. At every call we just bump a counter, which is viable.

Now, the rust version is in a mutex which may turn out to be a problem. I will profile that part, though the reason we need a light weight version of the buffered metrics is the one described in the python PR, we did not retrofit it in the buffer because we get speed in exchange for flexibility in the stats abstraction, which we want to preserve in the backend.

fpacifici · 2026-05-01T18:42:45Z


    fn submit(&mut self, message: Message<RoutedValue>) -> Result<(), SubmitError<RoutedValue>> {
-        if self.route != message.payload().route {
+        if self.route != message.payload().route || message.payload().payload.is_watermark_msg() {


Moved here in order not to count watermarks in metrics

Should we add a tag or use a different metric for watermarks? I would think we'd still like metrics on watermark messages, but tracked separately from main messages.

I think this is a good point. More importantly we should emit metrics when they are emitted and when they are committed.

Let me think over it.
I will send a PR for those that covers the overall pipeline
https://linear.app/getsentry/issue/STREAM-921/add-metrics-to-track-watermarks-per-step

evanh

Generally approved, added one comment.

evanh · 2026-05-04T15:10:38Z


    fn submit(&mut self, message: Message<RoutedValue>) -> Result<(), SubmitError<RoutedValue>> {
-        if self.route != message.payload().route {
+        if self.route != message.payload().route || message.payload().payload.is_watermark_msg() {


Should we add a tag or use a different metric for watermarks? I would think we'd still like metrics on watermark messages, but tracked separately from main messages.

fpacifici requested a review from a team as a code owner May 1, 2026 06:37

fix(ci): reorder Batch fields for stub syntax and rustfmt

2ba7e73

Put step_name before optional Batch parameters so rust_streams.pyi is valid Python (required args before defaults). Apply cargo fmt to satisfy the lint job. Made-with: Cursor

markstory reviewed May 1, 2026

View reviewed changes

Add step name to the headers filter

d8b1cfa

fpacifici commented May 1, 2026

View reviewed changes

Remove lock and move to threadlocal

059c640

evanh approved these changes May 4, 2026

View reviewed changes

fpacifici merged commit 865f51b into main May 4, 2026
22 checks passed

sentry-release-bot Bot mentioned this pull request May 4, 2026

publish: getsentry/streams/sentry_streams@0.0.50 getsentry/publish#8050

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(rust_streams): PipelineStats and Batch/Filter metrics#309

feat(rust_streams): PipelineStats and Batch/Filter metrics#309
fpacifici merged 4 commits intomainfrom
fpacifici/rust_metrics

fpacifici commented May 1, 2026 •

edited

Loading

Uh oh!

markstory May 1, 2026

Uh oh!

fpacifici May 1, 2026

Uh oh!

fpacifici May 1, 2026 •

edited

Loading

Uh oh!

evanh May 4, 2026

Uh oh!

fpacifici May 4, 2026

Uh oh!

evanh left a comment

Uh oh!

evanh May 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

fpacifici commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Uh oh!

markstory May 1, 2026

Choose a reason for hiding this comment

Uh oh!

fpacifici May 1, 2026

Choose a reason for hiding this comment

Uh oh!

fpacifici May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

evanh May 4, 2026

Choose a reason for hiding this comment

Uh oh!

fpacifici May 4, 2026

Choose a reason for hiding this comment

Uh oh!

evanh left a comment

Choose a reason for hiding this comment

Uh oh!

evanh May 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fpacifici commented May 1, 2026 •

edited

Loading

fpacifici May 1, 2026 •

edited

Loading