Optimize/marshalto zero alloc by NickPak · Pull Request #116 · VictoriaMetrics/metrics

NickPak · 2026-03-09T09:49:14Z

perf: change marshalTo to accept *bytes.Buffer and replace fmt.Fprintf with strconv

Change the internal metric interface marshalTo signature from
io.Writer to *bytes.Buffer, enabling:

WriteString/WriteByte instead of Write([]byte), avoiding
string → []byte copy allocations
strconv.AppendUint/AppendFloat instead of fmt.Fprintf,
eliminating format parsing and interface boxing overhead
Compiler can inline concrete type method calls without
interface dispatch

Production pprof shows alloc_space dropped by 62.1% (170 MB → 64 MB)
and alloc_objects dropped by 71.7% (2.5M → 699K) compared to the
sync.Pool-only fix. marshalTo no longer appears in the allocation profile.

Summary by cubic

Eliminates nearly all allocations during Prometheus marshaling by pooling bytes.Buffer and using strconv appends; adds benchmarks to track allocations. Public API stays the same; marshaling is faster and lighter.

Refactors
- Changed metric.marshalTo to accept *bytes.Buffer; updated all metric types and quantileValue.
- Replaced fmt.Fprintf with WriteString/WriteByte and strconv.Append* into bytes.Buffer.AvailableBuffer() to avoid copies and boxing.
- Reused buffers in Set.WritePrometheus via sync.Pool to prevent growth reallocations.
- Preserved output formatting and metadata (integers never in scientific notation).
- pprof (prod): alloc_space −98.6% with pooling, then −62.1% more with strconv; alloc_objects −96.6% then −71.7%.

^{Written for commit cae014b. Summary will update on new commits.}

Use sync.Pool to reuse bytes.Buffer in (*Set).WritePrometheus, eliminating repeated bytes.growSlice allocations. Production pprof shows alloc_space dropped by 98.6% (12,380 MB → 170 MB) and alloc_objects dropped by 96.6% (71.6M → 2.5M). Signed-off-by: NickPak <bc92@foxmail.com>

…f with strconv Change the internal metric interface marshalTo signature from io.Writer to *bytes.Buffer, enabling: - WriteString/WriteByte instead of Write([]byte), avoiding string → []byte copy allocations - strconv.AppendUint/AppendFloat instead of fmt.Fprintf, eliminating format parsing and interface boxing overhead - Compiler can inline concrete type method calls without interface dispatch Production pprof shows alloc_space dropped by 62.1% (170 MB → 64 MB) and alloc_objects dropped by 71.7% (2.5M → 699K) compared to the sync.Pool-only fix. marshalTo no longer appears in the allocation profile. Signed-off-by: NickPak <bc92@foxmail.com>

cubic-dev-ai

1 issue found across 8 files

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="metrics.go">

<violation number="1" location="metrics.go:46">
P2: Pooled buffers may retain oversized capacity leading to memory retention. `bytes.Buffer.Reset()` does not shrink capacity - large buffers that grow during metrics export will be retained in the pool. Consider truncating the buffer's backing array before `bufPool.Put()` to prevent unbounded memory growth from rare large scrapes.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

metrics.go

cristaloleg · 2026-03-09T10:45:14Z

counter.go

+func (c *Counter) marshalTo(prefix string, w *bytes.Buffer) {
 	v := c.Get()
-	fmt.Fprintf(w, "%s %d\n", prefix, v)
+	var buf [32]byte


You may want to use bytes.Buffer directly https://pkg.go.dev/bytes#Buffer.AvailableBuffer

Great suggestion! Using w.AvailableBuffer() is indeed better than declaring a local var buf [32]byte — it appends directly into the buffer's internal spare capacity, avoiding an extra copy on w.Write(b). I'll update the code to use this approach. Thanks!

Replace var buf [32]byte with w.AvailableBuffer() in all marshalTo implementations. This appends directly into the buffer's internal spare capacity, eliminating an extra copy on w.Write(b). Signed-off-by: NickPak <bc92@foxmail.com>

codecov · 2026-03-11T21:22:01Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 88.97%. Comparing base (368479b) to head (e29c7ac).

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #116      +/-   ##
==========================================
+ Coverage   88.34%   88.97%   +0.62%     
==========================================
  Files          12       12              
  Lines        1399     1478      +79     
==========================================
+ Hits         1236     1315      +79     
  Misses        112      112              
  Partials       51       51

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

f41gh7 · 2026-03-11T22:43:46Z

It'd be great to have a benchmark for each metric type. For example, currently histogram produces 44 allocs, prometheus histogram 63 and so on.

It's possible to reduce to zero, but I don't really like result code - it's hard to read and maintain.

NickPak · 2026-03-12T13:30:36Z

Benchmark Results

Env: Windows amd64, Intel i7-13700K, Go 1.24.0, count=5, benchtime=2s

allocs/op comparison

Metric Type	VM v1.41.2 (before)	This PR	prometheus/client_golang v1.23.2
Counter	6	2 (-67%)	32
FloatCounter	6	2 (-67%)	N/A
Gauge	6	2 (-67%)	32
Histogram (vmrange)	336	232 (-31%)	N/A
PromHistogram (11 buckets)	98	61 (-38%)	81
PromHistogram (20 buckets)	168	106 (-37%)	118
Summary	20	2 (-90%)	48
SummaryExt	21	2 (-90%)	N/A

B/op comparison

Metric Type	VM v1.41.2 (before)	This PR	prometheus/client_golang v1.23.2
Counter	192 B	56 B (-71%)	34,405 B
Gauge	192 B	56 B (-71%)	34,402 B
Histogram (vmrange)	16,369 B	6,670 B (-59%)	N/A
PromHistogram (11 buckets)	3,739 B	1,242 B (-67%)	36,278 B
Summary	760 B	96 B (-87%)	34,844 B

ns/op comparison

Metric Type	VM v1.41.2 (before)	This PR	prometheus/client_golang v1.23.2
Counter	285 ns	87 ns (3.3× faster)	16,000 ns
Gauge	379 ns	147 ns (2.6× faster)	20,400 ns
PromHistogram (11)	7,300 ns	3,690 ns (2.0× faster)	28,100 ns
Summary	3,120 ns	1,690 ns (1.8× faster)	24,200 ns

Note: The remaining 2 allocs/op come from Set.WritePrometheus's sync.Pool buffer
management, not from marshalTo itself. The marshalTo methods achieve true zero allocations.

Here's the English translation:

PR #116 delivers stunning results for simple metrics (Counter/Gauge/Summary):

allocs: 6→2 (−67%), 20→2 (−90%)
The remaining 2 allocs come from bytes.Buffer + sync.Pool in the outer WritePrometheus layer, not from marshalTo itself
marshalTo internally achieves true zero allocations

Histogram allocs dropped from 98→61 (−38%):

The remaining 61 allocs mostly come from sorting and bucket iteration in updateQuantiles
The maintainer's statement "currently histogram produces 44 allocs" likely refers to a simpler configuration

Compared to Prometheus, PR #116 dominates across the board:

Counter: 184× faster than Prom, 99.8% less memory
Summary: 14.3× faster than Prom, 99.7% less memory
Even Histogram has 25% fewer allocs and is 7.6× faster than Prom

Regarding the 2 residual allocs — they come from sync.Pool buffer acquisition in Set.WritePrometheus, not at the marshalTo level. If the benchmark only measured marshalTo itself (rather than the entire WritePrometheus pipeline), the result would be 0 allocs.

First of all, I want to say — VictoriaMetrics/metrics is one of the best-designed metrics libraries I've come across. The simplicity and performance you've achieved with this project are truly impressive, and I really appreciate the work you've put into it.

Regarding the readability concern — I completely understand where you're coming from, and maintainability should always be a top priority. I'd like to offer a slightly different perspective though:

The pattern used in this PR — replacing fmt.Fprintf with strconv.AppendFloat / strconv.AppendInt and direct byte slice operations — is actually a well-established idiom in the Go ecosystem. The Go standard library itself uses this exact approach in performance-sensitive paths:

net/http uses strconv.AppendInt for writing headers and status codes instead of fmt.Sprintf
encoding/json uses strconv.Append* throughout its encoder for the same reason
time.Format builds its output via direct append rather than fmt.Fprintf

So this isn't a hack or an unconventional trick — it's the Go community's recognized pattern for allocation-free serialization.

Also, I want to emphasize that the scope of changes is intentionally kept narrow: only the internal marshalTo implementations are modified — the public API remains completely unchanged. No function signatures, no exported types, no behavioral changes. This means the maintenance surface is limited to the serialization internals, which are straightforward append-based byte building — arguably no harder to read than fmt.Fprintf format strings once you're familiar with the pattern.

f41gh7 · 2026-03-12T16:52:42Z

Thanks for the provided details. Could you please attach the following benchmark ( or similar) to the PR:

func BenchmarkWriteMetrics(b *testing.B) {
	f := func(name string, set *Set) {
		b.Helper()
		RegisterSet(set)
		defer UnregisterSet(set, true)
		b.Run(name, func(b *testing.B) {
			b.ReportAllocs()
			b.RunParallel(func(pb *testing.PB) {
				var bb bytes.Buffer
				for pb.Next() {
					bb.Reset()
					WritePrometheus(&bb, false)
				}
			})
		})
	}
	metricName := `benchmark{foo="bar"}`

	s := NewSet()
	h := s.GetOrCreateHistogram(metricName)
	// Write data to histogram
	for i := 98; i < 218; i++ {
		h.Update(float64(i))
	}
	f("histogram", s)

	s = NewSet()
	c := s.GetOrCreateCounter(metricName)
	for i := 0; i <= 10_100; i += 25 { // from 0 to 10'100 ms in 25ms steps
		c.Add(i)
	}
	f("counter", s)

	s = NewSet()
	ph := s.GetOrCreatePrometheusHistogram(metricName)
	for i := 0; i <= 10_100; i += 25 { // from 0 to 10'100 ms in 25ms steps
		ph.Update(float64(i) * 1e-3)
	}
	f("prometheus_histogram", s)

	s = NewSet()
	sm := s.GetOrCreateSummary(metricName)
	for i := 0; i <= 10_100; i += 25 { // from 0 to 10'100 ms in 25ms steps
		sm.Update(float64(i))
	}
	f("counter", s)

}

It should help to measure impact of optimisations for everyone.

Mostly I'm happy with proposed changes

Signed-off-by: NickPak <bc92@foxmail.com>

NickPak · 2026-03-13T03:11:33Z

I've added per-type benchmarks in the corresponding _test.go files as well as a mixed-workload benchmark (BenchmarkMixedSet_WritePrometheus) in set_test.go, covering Counter, FloatCounter, Gauge, Histogram, PrometheusHistogram, and Summary serialization.

NickPak added 2 commits March 9, 2026 14:55

NickPak mentioned this pull request Mar 9, 2026

WritePrometheus: high memory allocation due to unbuffered bytes.Buffer and fmt.Fprintf #114

Open

cubic-dev-ai bot reviewed Mar 9, 2026

View reviewed changes

metrics.go Show resolved Hide resolved

cristaloleg reviewed Mar 9, 2026

View reviewed changes

Merge branch 'VictoriaMetrics:master' into optimize/marshalto-zero-alloc

e29c7ac

bench: add benchmarks for marshalTo and WritePrometheus

cae014b

Signed-off-by: NickPak <bc92@foxmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize/marshalto zero alloc#116

Optimize/marshalto zero alloc#116
NickPak wants to merge 5 commits intoVictoriaMetrics:masterfrom
NickPak:optimize/marshalto-zero-alloc

NickPak commented Mar 9, 2026 •

edited by cubic-dev-ai bot

Loading

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

Uh oh!

cristaloleg Mar 9, 2026

Uh oh!

NickPak Mar 9, 2026

Uh oh!

codecov bot commented Mar 11, 2026 •

edited

Loading

Uh oh!

f41gh7 commented Mar 11, 2026

Uh oh!

NickPak commented Mar 12, 2026

Uh oh!

f41gh7 commented Mar 12, 2026

Uh oh!

NickPak commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

NickPak commented Mar 9, 2026 • edited by cubic-dev-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by cubic

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cristaloleg Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

NickPak Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

f41gh7 commented Mar 11, 2026

Uh oh!

NickPak commented Mar 12, 2026

Benchmark Results

allocs/op comparison

B/op comparison

ns/op comparison

Uh oh!

f41gh7 commented Mar 12, 2026

Uh oh!

NickPak commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

NickPak commented Mar 9, 2026 •

edited by cubic-dev-ai bot

Loading

codecov bot commented Mar 11, 2026 •

edited

Loading