feat(dogstatsd): decouple reader from processor#75
Conversation
There was a problem hiding this comment.
Pull request overview
This PR refactors the DogStatsD server to improve throughput under burst traffic by decoupling packet reading from metric processing. Previously, the single-threaded spin() loop read one packet, parsed it, and inserted metrics into the aggregator before reading the next packet. This caused packets to accumulate in the kernel's SO_RCVBUF buffer, which is capped at ~416 KiB on Lambda, leading to packet drops.
The new architecture spawns a dedicated reader task that continuously drains the UDP socket into a bounded in-memory channel, while the main task processes packets from that channel. This moves buffering from kernel space to user space (configurable via queue_size, defaulting to ~8 MB), significantly reducing packet loss during bursts.
Changes:
- Introduced a bounded channel (queue_size: 1024 packets) between socket reading and metric processing
- Refactored
spin()to spawn a reader task and process packets from a channel - Extracted
insert_metrics,prepend_namespace, andprocess_packetas standalone functions - Added
queue_sizeconfiguration option toDogStatsDConfig
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| crates/dogstatsd/src/dogstatsd.rs | Core refactoring: split single loop into read_loop (spawned task) and process_loop, added queue_size config, refactored helper functions to work without self reference |
| crates/dogstatsd/tests/integration_test.rs | Added queue_size: None to all DogStatsDConfig test instances |
| crates/datadog-serverless-compat/src/main.rs | Added queue_size: None to DogStatsDConfig initialization |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
95c907f to
6290b16
Compare
0684184 to
9fc1d1f
Compare
6290b16 to
8a404ed
Compare
97d3bca to
d30cd5c
Compare
ed6c5b3 to
a882e1d
Compare
d30cd5c to
57ef7fc
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
c946ef0 to
1e4569b
Compare
57ef7fc to
2b46ff8
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
28ef884 to
47b13ad
Compare
Split spin() into a dedicated reader task and processor task connected by a bounded channel. The reader drains the socket into user-space buffering (~8 MB channel vs ~416 KiB kernel SO_RCVBUF cap), while the processor handles parsing and aggregation independently. This prevents head-of-line blocking where processing time caused kernel buffer overflow and silent packet drops under high-volume metric bursts. - Add configurable queue_size (DD_DOGSTATSD_QUEUE_SIZE, default 1024) - Extract process_packet() and insert_metrics() as free functions - Add read_loop() with cancellation-aware tokio::select! on reads - Add process_loop() that drains remaining packets on shutdown - Handle named pipe ConnectionReset as clean transport close - Validate queue_size=0 with fallback to default Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
47b13ad to
9df7478
Compare
duncanpharvey
left a comment
There was a problem hiding this comment.
Looks good! Added a few comments for consideration.
What does this PR do?
Splits the single-threaded spin() loop into two cooperating tasks connected by a bounded channel:
Also adds a configurable queue_size option to DogStatsDConfig (matching the Go agent's DD_DOGSTATSD_QUEUE_SIZE), defaulting to 1024.
Motivation
The previous spin() loop read one packet, parsed every metric in it, inserted them into the aggregator, and only then read the next packet. While the processor was busy, incoming packets accumulated in the kernel's SO_RCVBUF — which on Lambda is capped at ~416 KiB (~7
packets at 64 KB). Under burst traffic (e.g. 300k metrics), the kernel silently drops packets that overflow the buffer.
By decoupling the reader from the processor, buffering moves from kernel space (~416 KiB, not configurable on Lambda) to user space (queue_size × buffer_size ≈ 8 MB at defaults). The reader never blocks on processing, so the kernel buffer stays drained. This is where
the bulk of the metric recovery happens.
SVLS-8551
Notes
https://github.com/DataDog/datadog-agent/blob/85939a62b5580b2a15549f6936f257e61c5aa153/pkg/config/config_template.yaml#L2184-L2190
Describe how to test/QA your changes
Sending 300k metrics over a 3 second span, we get an improvement of receiving 145k to around 210k metrics