Conversation
Execution-Time Benchmarks Report ⏱️Execution-time results for samples comparing This PR (8450) and master. ✅ No regressions detected - check the details below Full Metrics ComparisonFakeDbCommand
HttpMessageHandler
Comparison explanationExecution-time benchmarks measure the whole time it takes to execute a program, and are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are highlighted in **red**. The following thresholds were used for comparing the execution times:
Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard. Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph). Duration chartsFakeDbCommand (.NET Framework 4.8)gantt
title Execution time (ms) FakeDbCommand (.NET Framework 4.8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8450) - mean (74ms) : 70, 79
master - mean (73ms) : 70, 76
section Bailout
This PR (8450) - mean (77ms) : 75, 79
master - mean (77ms) : 75, 79
section CallTarget+Inlining+NGEN
This PR (8450) - mean (1,078ms) : 1033, 1123
master - mean (1,077ms) : 1031, 1124
FakeDbCommand (.NET Core 3.1)gantt
title Execution time (ms) FakeDbCommand (.NET Core 3.1)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8450) - mean (117ms) : 111, 122
master - mean (115ms) : 110, 120
section Bailout
This PR (8450) - mean (116ms) : 114, 118
master - mean (115ms) : 112, 118
section CallTarget+Inlining+NGEN
This PR (8450) - mean (796ms) : 772, 820
master - mean (804ms) : 776, 831
FakeDbCommand (.NET 6)gantt
title Execution time (ms) FakeDbCommand (.NET 6)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8450) - mean (100ms) : 96, 104
master - mean (100ms) : 97, 103
section Bailout
This PR (8450) - mean (103ms) : 98, 108
master - mean (104ms) : 99, 109
section CallTarget+Inlining+NGEN
This PR (8450) - mean (937ms) : 900, 975
master - mean (938ms) : 901, 975
FakeDbCommand (.NET 8)gantt
title Execution time (ms) FakeDbCommand (.NET 8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8450) - mean (103ms) : 96, 109
master - mean (102ms) : 97, 108
section Bailout
This PR (8450) - mean (102ms) : 99, 105
master - mean (103ms) : 97, 108
section CallTarget+Inlining+NGEN
This PR (8450) - mean (829ms) : 784, 874
master - mean (827ms) : 792, 863
HttpMessageHandler (.NET Framework 4.8)gantt
title Execution time (ms) HttpMessageHandler (.NET Framework 4.8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8450) - mean (209ms) : 204, 214
master - mean (210ms) : 204, 216
section Bailout
This PR (8450) - mean (214ms) : 209, 219
master - mean (213ms) : 210, 216
section CallTarget+Inlining+NGEN
This PR (8450) - mean (1,229ms) : 1184, 1275
master - mean (1,229ms) : 1181, 1277
HttpMessageHandler (.NET Core 3.1)gantt
title Execution time (ms) HttpMessageHandler (.NET Core 3.1)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8450) - mean (303ms) : 295, 311
master - mean (304ms) : 296, 312
section Bailout
This PR (8450) - mean (305ms) : 297, 312
master - mean (304ms) : 297, 311
section CallTarget+Inlining+NGEN
This PR (8450) - mean (1,012ms) : 985, 1040
master - mean (1,006ms) : 979, 1033
HttpMessageHandler (.NET 6)gantt
title Execution time (ms) HttpMessageHandler (.NET 6)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8450) - mean (299ms) : 292, 307
master - mean (298ms) : 287, 308
section Bailout
This PR (8450) - mean (299ms) : 293, 306
master - mean (297ms) : 291, 304
section CallTarget+Inlining+NGEN
This PR (8450) - mean (1,180ms) : 1147, 1213
master - mean (1,183ms) : 1145, 1221
HttpMessageHandler (.NET 8)gantt
title Execution time (ms) HttpMessageHandler (.NET 8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8450) - mean (296ms) : 287, 305
master - mean (297ms) : 291, 303
section Bailout
This PR (8450) - mean (296ms) : 291, 301
master - mean (299ms) : 288, 310
section CallTarget+Inlining+NGEN
This PR (8450) - mean (1,076ms) : 994, 1159
master - mean (1,077ms) : 977, 1178
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
BenchmarksBenchmark execution time: 2026-04-16 20:08:58 Comparing candidate commit bb42028 in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 27 metrics, 0 unstable metrics, 87 known flaky benchmarks.
|
b0dfa05 to
ec5f36b
Compare
DSM Per-Message Overhead Optimizations
Summary of changes
EdgeTagCache<TKey>andBacklogTagCache<TKey>— process-wide, per-typeConcurrentDictionarycaches that intern edge-tag arrays and backlog-tag strings so they are only allocated once per unique key (topic/group/cluster combination).NodeHashCacheEntry/NodeHashSnapshotmechanism insideDataStreamsManagerthat memoizes the expensiveCalculateNodeHashresult per(edgeTags[], nodeHashBase)pair. Reads are lock-free via a volatile field; writes acquire a per-entry lock only on cache miss or base change.PathwayContextEncoder.EncodeIntoand aSpan<byte>-basedDecodeoverload;DataStreamsContextPropagatorusesstackallocbuffers on .NET Core 3.1+ to avoid intermediatebyte[]heap allocations on every produce/consume.DataStreamsAggregatorandDataStreamsManager._nodeHashCachenow use reference-equality comparers backed byRuntimeHelpers.GetHashCode, which is safe because all keys are interned by the caches above.Thread.Sleeppolling loop inDataStreamsWriterwith aManualResetEventSlimthat wakes immediately when the queue reaches 1 000 items or after a 500 ms timeout, eliminating unnecessary context switches.readonly structcache keys (ConsumeEdgeTagCacheKey,ProduceEdgeTagCacheKey,CommitBacklogTagCacheKey,ProduceBacklogTagCacheKey) for Kafka; equivalent structs for AWS SQS/SNS/Kinesis, Azure Service Bus, IBM MQ, and RabbitMQ.Remove(TemporaryBase64PathwayContext)header scan is now skipped whenKafkaCreateConsumerScopeEnabled=true(the default), avoiding an O(n) scan on every message.LastConsumePathwayguard removed: Dropped the redundant!= nullguard on the produce path that required anAsyncLocalread before the actualAsyncLocalread.Reason for change
DSM instrumentation runs on the hot path of every instrumented message. Profiling revealed that the dominant allocations were:
string[]edge-tag array on every produce/consume call.CalculateNodeHashcall (hashing over all edge tags) on every checkpoint.byte[]arrays for pathway context Base64 encoding/decoding.These optimizations target p99 and throughput benchmarks for Kafka, SQS, SNS, RabbitMQ, IBM MQ, Azure Service Bus, and Kinesis instrumentation.
Implementation details
Caching strategy
EdgeTagCache<TKey>andBacklogTagCache<TKey>use the static-generic-class pattern (static class Foo<T>with a static field) to give each integration its own dictionary instance without any runtime dispatch. The key type is areadonly structimplementingIEquatable<TKey>, which prevents boxing inConcurrentDictionarylookups.The caches are bounded at
MaxEdgeTagCacheSize = 1000entries. Once that limit is reached, new keys are computed on the fly (no caching) to prevent unbounded memory growth from high-cardinality identifiers.Node-hash caching
_nodeHashCacheis keyed bystring[]identity (not value equality) because the arrays themselves are interned byEdgeTagCache<TKey>. Each entry holds a volatileNodeHashSnapshot(nodeHashBase+NodeHash). On every checkpoint:Zero-allocation encode/decode
PathwayContextEncoder.EncodeInto(PathwayContext, Span<byte>)writes directly into a caller-supplied buffer.DataStreamsContextPropagatorstackallocsMaxEncodedSize(26 bytes) andMaxBase64EncodedSize(36 bytes) on the stack and usesBase64.EncodeToUtf8/DecodeFromUtf8in-place. The only unavoidable allocation is the finalToArray()passed toheaders.Add, because Kafka takes ownership of the byte array.This path is guarded by
#if NETCOREAPP3_1_OR_GREATER; .NET Framework falls back to the original heap-allocating path.Drain signal
DataStreamsWriterpreviously slept 10 ms unconditionally between drain iterations, burning CPU and adding ~10 ms latency per batch even under load. The newManualResetEventSlimis signalled immediately when either queue exceedsDrainThreshold(1 000 items), capping worst-case latency atDrainTimeoutMs(500 ms) while eliminating idle wakeups.Test coverage
DataStreamsManagerTests: new unit tests verify thatGetOrCreateEdgeTagsandGetOrCreateBacklogTagsreturn the same array/string reference on repeated calls with the same key, and distinct references for different keys. Tests cover Kafka produce/consume, RabbitMQ produce/consume, and generic key types.PathwayContextEncoderTests: existing encode/decode round-trip tests pass against the newSpan<byte>overloads.Other details
MaxEdgeTagCacheSizeconstant isinternalto allow unit tests to verify the overflow/bypass behavior.internal..NET Frameworkcode paths are unchanged — allSpan-based optimizations are gated behind#if NETCOREAPP3_1_OR_GREATER.