Skip to content

Reuse grpc buffer in querier's store-gateway stream#7519

Draft
eeldaly wants to merge 2 commits into
cortexproject:masterfrom
eeldaly:querier-grpc
Draft

Reuse grpc buffer in querier's store-gateway stream#7519
eeldaly wants to merge 2 commits into
cortexproject:masterfrom
eeldaly:querier-grpc

Conversation

@eeldaly
Copy link
Copy Markdown
Contributor

@eeldaly eeldaly commented May 14, 2026

What this PR does:
Reuse grpc buffer in querier's store-gateway stream to reduce memory usage in large queries

Current heap dump during heavy queries:
image

GRPC Store-Gateway Bench Test NEW:

BenchmarkGrpcStoreGatewayCalls-10    	    1501	    929363 ns/op	   59943 B/op	     455 allocs/op
BenchmarkGrpcStoreGatewayCalls-10    	    1723	    693808 ns/op	   49132 B/op	     456 allocs/op
BenchmarkGrpcStoreGatewayCalls-10    	    1818	    649024 ns/op	   49703 B/op	     454 allocs/op
BenchmarkGrpcStoreGatewayCalls-10    	    1516	    949340 ns/op	   46230 B/op	     453 allocs/op
BenchmarkGrpcStoreGatewayCalls-10    	    1765	    710793 ns/op	   50579 B/op	     453 allocs/op
BenchmarkGrpcStoreGatewayCalls-10    	    1670	    658193 ns/op	   50318 B/op	     458 allocs/op

GRPC Store-Gateway Bench Test OLD:

BenchmarkGrpcStoreGatewayCalls-10    	    1869	    717663 ns/op	  553930 B/op	     482 allocs/op
BenchmarkGrpcStoreGatewayCalls-10    	    1275	    849913 ns/op	  561863 B/op	     477 allocs/op
BenchmarkGrpcStoreGatewayCalls-10    	    1290	    834707 ns/op	  567643 B/op	     478 allocs/op
BenchmarkGrpcStoreGatewayCalls-10    	    1356	   1100831 ns/op	  560738 B/op	     477 allocs/op
BenchmarkGrpcStoreGatewayCalls-10    	    1146	   1179173 ns/op	  573945 B/op	     477 allocs/op
BenchmarkGrpcStoreGatewayCalls-10    	    1335	    875292 ns/op	  556640 B/op	     481 allocs/op

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]
  • docs/configuration/v1-guarantees.md updated if this PR introduces experimental flags

way stream

Signed-off-by: Essam Eldaly <eeldaly@amazon.com>
Signed-off-by: Essam Eldaly <eeldaly@amazon.com>
//
// With SeriesResponse implementing ReleasableMessage, calling Free() after each Recv()
// returns the unmarshal buffer to the pool, reducing per-message allocations by ~32KB.
func BenchmarkGrpcStoreGatewayCalls(b *testing.B) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems not the right place for benchmarking store gateways. Can you move to its own folder?


// MessageWithBufRef holds a reference to gRPC unmarshal buffers for explicit lifecycle management.
// It satisfies cortexpb.ReleasableMessage via structural typing.
type MessageWithBufRef struct {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't change Thanos via vendor

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change upstream if this helps, still in process of testing and wanted to get opinions on draft pr since I'm not very familiar with grpc

@harry671003
Copy link
Copy Markdown
Contributor

I think we should also apply this fix in distributor_queryable.go when querying from ingesters. I remember seeing a similar kind of heap profile for ingester queries as well.

}

// Buffer reference for explicit memory management via cortexCodec.
cortexpb.MessageWithBufRef Ref = 1001 [(gogoproto.embed) = true, (gogoproto.customtype) = "github.com/cortexproject/cortex/pkg/cortexpb.MessageWithBufRef", (gogoproto.nullable) = false];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will a problem. Can we avoid having upstream changes in Thanos. We can't have this circular dependency with Thanos importing Cortex.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you scope this PR down to just the detach? That alone fixes the OOM (buffer gets GC'd once nothing references it). The pool-reuse part needs the proto changes which create a circular dep with Thanos.

In parallel we can open an upstream thanos PR to add a generic BufRef to SeriesResponse — once vendored back, we can swap detach for Free() to get the alloc wins.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants