perf(tsdb/index): allocation-free postings offset table scan by simonswine · Pull Request #5159 · grafana/pyroscope

simonswine · 2026-05-15T12:05:23Z

Summary

newReader spent ~96% of its allocations inside ReadOffsetTable: every entry materialised a []string plus two string clones for name and value, even for the ~31/32 entries that symbolFactor sampling immediately discards.

This PR introduces readPostingsOffsetTable, a sibling of ReadOffsetTable that yields (name, value []byte) slices aliasing the mmap buffer instead of allocating strings. The newReader callback:

uses yoloString for map lookups (zero allocation),
defers string(name) / string(value) conversion until an entry is actually retained, and
tracks the "last value" candidate as raw byte-slice aliases, cloning only at flush time.

Impact

This path represents 66% of the allocation counts in query-backends (and a very small amount of compaction workers)

Benchmark

BenchmarkNewReader with 10 label names × 200 values each:

             │      sec/op       │      sec/op      vs base  │
NewReader-12      305.1µ ± 1%       124.2µ ± 1%    -59.30%

             │       B/op        │       B/op       vs base  │
NewReader-12      270.7Ki ± 0%      169.8Ki ± 0%   -37.30%

             │     allocs/op     │   allocs/op    vs base    │
NewReader-12       6075.0 ± 0%       244.0 ± 0%   -95.98%

Allocation count drops from 6 075 → 244 (−96%), wall time from 305 µs → 124 µs (−59%).

Notes

readPostingsOffsetTable is unexported and used only by newReader and PostingsRanges.
This builds on refactor(tsdb/index): drop FormatV1 reader support #5158 (FormatV1 reader support removed), which clears away the V1 branch that previously shared ReadOffsetTable.
Inspired by the newer version of Prometheus code base: https://github.com/prometheus/prometheus/blob/43e5fc6a248c1a73d30dff86428f74d11aedbedc/tsdb/index/index.go#L1297-L1302

Test plan

go test ./pkg/phlaredb/tsdb/index/... -run . -bench BenchmarkNewReader passes and shows regression-free numbers
Existing index reader tests (TestNewReader, table-driven cases) pass unchanged

Note

Medium Risk
Touches TSDB index reading/parsing and introduces more buffer-aliasing ([]byte/yoloString) to avoid allocations, which could cause subtle lifetime/aliasing bugs if misused despite being localized and covered by tests/bench.

Overview
Reduces NewReader startup allocations/time by replacing the generic ReadOffsetTable postings scan with an allocation-free readPostingsOffsetTable that returns (name, value []byte) slices aliasing the index buffer, and only materializes string values for sampled/required entries (including correct handling of the per-label “last value” entry).

Updates PostingsRanges to use the new postings-table iterator, adjusts the label-indices test to decode the table directly, and adds BenchmarkNewReader to track open-index performance/regressions.

^{Reviewed by Cursor Bugbot for commit e07c1e0. Bugbot is set up for automated code reviews on this repo. Configure here.}

newReader spent ~96% of its allocations in ReadOffsetTable: each entry allocated a []string plus string clones for name and value, even for the ~31/32 entries discarded by symbolFactor sampling. Switch to readPostingsOffsetTable which yields (name, value []byte) aliasing the index buffer. The callback uses yoloString for map lookups, defers string conversion until an entry is retained, and tracks the "last" candidate as raw byte-slice aliases — cloning only at flush time. BenchmarkNewReader (10 label names × 200 values): │ sec/op │ sec/op vs base │ NewReader-12 305.1µ 124.2µ ± 1% -59.30% │ B/op │ B/op vs base │ NewReader-12 270.7Ki 169.8Ki ± 0% -37.30% │ allocs/op │ allocs/op vs base │ NewReader-12 6075.0 244.0 ± 0% -95.98%

aleks-p

Great improvement!

aleks-p · 2026-05-15T14:04:22Z

 	}

-	var lastKey []string
+	// lastNameB/lastValueB alias the mmap buffer and are only converted to


nit, maybe we can remove the mention of mmap here to avoid confusion, it looks like the data is always heap allocated (and it shouldn't matter in general)

Likely coming from the prometheus context 👍

Fixed in e07c1e0

Address review feedback — the reader uses heap-allocated buffers, not mmap, so the term was misleading.

simonswine requested a review from korniltsev-grafanista as a code owner May 15, 2026 12:05

aleks-p previously approved these changes May 15, 2026

View reviewed changes

docs(tsdb/index): remove mmap references from comments

e07c1e0

Address review feedback — the reader uses heap-allocated buffers, not mmap, so the term was misleading.

simonswine dismissed aleks-p’s stale review via e07c1e0 May 15, 2026 14:12

aleks-p approved these changes May 15, 2026

View reviewed changes

simonswine merged commit 8c1d10c into grafana:main May 15, 2026
33 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(tsdb/index): allocation-free postings offset table scan#5159

perf(tsdb/index): allocation-free postings offset table scan#5159
simonswine merged 2 commits into
grafana:mainfrom
simonswine:20260514_index-no-allocs-offset-table

simonswine commented May 15, 2026 •

edited by cursor Bot

Loading

Uh oh!

aleks-p left a comment

Uh oh!

aleks-p May 15, 2026

Uh oh!

simonswine May 15, 2026

Uh oh!

simonswine May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

simonswine commented May 15, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Impact

Benchmark

Notes

Test plan

Uh oh!

aleks-p left a comment

Choose a reason for hiding this comment

Uh oh!

aleks-p May 15, 2026

Choose a reason for hiding this comment

Uh oh!

simonswine May 15, 2026

Choose a reason for hiding this comment

Uh oh!

simonswine May 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

simonswine commented May 15, 2026 •

edited by cursor Bot

Loading