fix(dogstatsd): decouple socket reading from metric processing#76
fix(dogstatsd): decouple socket reading from metric processing#76duncanista merged 2 commits intomainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR optimizes the DogStatsD metric reader task to reduce per-packet overhead and improve burst absorption through three key optimizations: a buffer pool to eliminate allocations in steady state, a batch drain loop to empty the kernel buffer without re-entering tokio's event loop, and replacing tokio::select! with is_cancelled() checks for lower per-iteration overhead.
Changes:
- Introduced a
Packetstruct to carry pooled buffers with metadata through the processing pipeline - Added
BufferReader::try_read_into()for non-blocking packet reads to enable batch draining - Implemented an unbounded channel-based buffer pool where the processor returns buffers to the reader for reuse
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
97d3bca to
d30cd5c
Compare
857a8d4 to
48236d7
Compare
d30cd5c to
57ef7fc
Compare
kathiehuang
left a comment
There was a problem hiding this comment.
I think this makes sense but this is my first time looking at dogstatsd code 😅 feel free to request other reviewers too!
If you could respond to copilot's comments that would be helpful too!
57ef7fc to
2b46ff8
Compare
48236d7 to
f0b864b
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Drain any packets already in the kernel buffer without going | ||
| // through tokio's event loop — just raw non-blocking syscalls. | ||
| loop { | ||
| let mut buf = pool_rx.try_recv().unwrap_or_else(|_| vec![0u8; buf_size]); | ||
| match reader.try_read_into(&mut buf) { | ||
| Ok(Some((len, source))) => { | ||
| if !try_send_packet(&tx, Packet { buf, len, source }, &mut dropped, &pool_tx) { | ||
| break; | ||
| } | ||
| } | ||
| Ok(None) => { | ||
| let _ = pool_tx.send(buf); | ||
| break; | ||
| } |
There was a problem hiding this comment.
The batch-drain loop runs unconditionally, but try_read_into() returns Ok(None) for non-UDP transports. That means each iteration still pulls a buffer from the pool (or allocates on cold start) and immediately returns it, adding extra channel traffic and potentially an extra allocation per packet for named pipes/mirror transport. Consider gating the drain loop to only run for UdpSocket (e.g., via a supports_try_read()/is_udp() check) so non-UDP paths don’t pay this overhead.
There was a problem hiding this comment.
The overhead is trivial — one try_recv() + one send() on unbounded channels per packet for named pipes. Adding a supports_try_read() method adds API surface and branching for a path that barely matters. Named pipes are Windows-only and low-volume compared to UDP.
47b13ad to
9df7478
Compare
55183db to
e7c66c9
Compare
Reduce per-packet overhead in the reader loop with three optimizations: - Buffer pool: cycle Vec<u8> buffers between reader and processor via an unbounded channel, eliminating heap allocation per read in steady state. Pool size is naturally bounded by the data channel capacity. - Batch drain: after each async read, call try_read_into() in a tight loop to drain all pending kernel packets without re-entering tokio's event loop. Most impactful after hypervisor scheduling pauses on Lambda. - Fast cancellation check: use is_cancelled() (AtomicBool load) at the top of each iteration, with tokio::select! on the async read to ensure the reader exits promptly even on a quiet socket. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
What does this PR do?
Three optimizations to the reader task introduced in the previous PR, reducing per-packet overhead and improving burst absorption:
Motivation
The batch drain loop (try_read_into → try_send → repeat) processes all pending packets in a tight syscall loop with zero allocations, emptying the kernel buffer before the next scheduling pause can cause drops.
SVLS-8552
Additional Notes
Describe how to test/QA your changes