Optimize/marshalto zero alloc#116
Conversation
Use sync.Pool to reuse bytes.Buffer in (*Set).WritePrometheus, eliminating repeated bytes.growSlice allocations. Production pprof shows alloc_space dropped by 98.6% (12,380 MB → 170 MB) and alloc_objects dropped by 96.6% (71.6M → 2.5M). Signed-off-by: NickPak <bc92@foxmail.com>
…f with strconv Change the internal metric interface marshalTo signature from io.Writer to *bytes.Buffer, enabling: - WriteString/WriteByte instead of Write([]byte), avoiding string → []byte copy allocations - strconv.AppendUint/AppendFloat instead of fmt.Fprintf, eliminating format parsing and interface boxing overhead - Compiler can inline concrete type method calls without interface dispatch Production pprof shows alloc_space dropped by 62.1% (170 MB → 64 MB) and alloc_objects dropped by 71.7% (2.5M → 699K) compared to the sync.Pool-only fix. marshalTo no longer appears in the allocation profile. Signed-off-by: NickPak <bc92@foxmail.com>
There was a problem hiding this comment.
1 issue found across 8 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="metrics.go">
<violation number="1" location="metrics.go:46">
P2: Pooled buffers may retain oversized capacity leading to memory retention. `bytes.Buffer.Reset()` does not shrink capacity - large buffers that grow during metrics export will be retained in the pool. Consider truncating the buffer's backing array before `bufPool.Put()` to prevent unbounded memory growth from rare large scrapes.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
counter.go
Outdated
| func (c *Counter) marshalTo(prefix string, w *bytes.Buffer) { | ||
| v := c.Get() | ||
| fmt.Fprintf(w, "%s %d\n", prefix, v) | ||
| var buf [32]byte |
There was a problem hiding this comment.
You may want to use bytes.Buffer directly https://pkg.go.dev/bytes#Buffer.AvailableBuffer
There was a problem hiding this comment.
Great suggestion! Using w.AvailableBuffer() is indeed better than declaring a local var buf [32]byte — it appends directly into the buffer's internal spare capacity, avoiding an extra copy on w.Write(b). I'll update the code to use this approach. Thanks!
Replace var buf [32]byte with w.AvailableBuffer() in all marshalTo implementations. This appends directly into the buffer's internal spare capacity, eliminating an extra copy on w.Write(b). Signed-off-by: NickPak <bc92@foxmail.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #116 +/- ##
==========================================
+ Coverage 88.34% 88.97% +0.62%
==========================================
Files 12 12
Lines 1399 1478 +79
==========================================
+ Hits 1236 1315 +79
Misses 112 112
Partials 51 51 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
It'd be great to have a benchmark for each metric type. For example, currently histogram produces 44 allocs, prometheus histogram 63 and so on. It's possible to reduce to zero, but I don't really like result code - it's hard to read and maintain. |
Benchmark ResultsEnv: Windows amd64, Intel i7-13700K, Go 1.24.0, count=5, benchtime=2s allocs/op comparison
B/op comparison
ns/op comparison
Note: The remaining 2 allocs/op come from Here's the English translation: PR #116 delivers stunning results for simple metrics (Counter/Gauge/Summary):
Histogram allocs dropped from 98→61 (−38%):
Compared to Prometheus, PR #116 dominates across the board:
Regarding the 2 residual allocs — they come from First of all, I want to say — VictoriaMetrics/metrics is one of the best-designed metrics libraries I've come across. The simplicity and performance you've achieved with this project are truly impressive, and I really appreciate the work you've put into it. Regarding the readability concern — I completely understand where you're coming from, and maintainability should always be a top priority. I'd like to offer a slightly different perspective though: The pattern used in this PR — replacing
So this isn't a hack or an unconventional trick — it's the Go community's recognized pattern for allocation-free serialization. Also, I want to emphasize that the scope of changes is intentionally kept narrow: only the internal |
|
Thanks for the provided details. Could you please attach the following benchmark ( or similar) to the PR: It should help to measure impact of optimisations for everyone. Mostly I'm happy with proposed changes |
Signed-off-by: NickPak <bc92@foxmail.com>
|
I've added per-type benchmarks in the corresponding _test.go files as well as a mixed-workload benchmark (BenchmarkMixedSet_WritePrometheus) in set_test.go, covering Counter, FloatCounter, Gauge, Histogram, PrometheusHistogram, and Summary serialization. |
perf: change marshalTo to accept *bytes.Buffer and replace fmt.Fprintf with strconv
Change the internal metric interface marshalTo signature from
io.Writer to *bytes.Buffer, enabling:
string → []byte copy allocations
eliminating format parsing and interface boxing overhead
interface dispatch
Production pprof shows alloc_space dropped by 62.1% (170 MB → 64 MB)
and alloc_objects dropped by 71.7% (2.5M → 699K) compared to the
sync.Pool-only fix. marshalTo no longer appears in the allocation profile.
Summary by cubic
Eliminates nearly all allocations during Prometheus marshaling by pooling
bytes.Bufferand usingstrconvappends; adds benchmarks to track allocations. Public API stays the same; marshaling is faster and lighter.metric.marshalToto accept*bytes.Buffer; updated all metric types andquantileValue.fmt.FprintfwithWriteString/WriteByteandstrconv.Append*intobytes.Buffer.AvailableBuffer()to avoid copies and boxing.Set.WritePrometheusviasync.Poolto prevent growth reallocations.strconv; alloc_objects −96.6% then −71.7%.Written for commit cae014b. Summary will update on new commits.