DSM overhead optimizations by kr-igor · Pull Request #8450 · DataDog/dd-trace-dotnet

kr-igor · 2026-04-13T20:33:59Z

DSM Per-Message Overhead Optimizations

Summary of changes

Edge-tag array caching: Introduced EdgeTagCache<TKey> and BacklogTagCache<TKey> — process-wide, per-type ConcurrentDictionary caches that intern edge-tag arrays and backlog-tag strings so they are only allocated once per unique key (topic/group/cluster combination).
Node-hash caching: Added a NodeHashCacheEntry/NodeHashSnapshot mechanism inside DataStreamsManager that memoizes the expensive CalculateNodeHash result per (edgeTags[], nodeHashBase) pair. Reads are lock-free via a volatile field; writes acquire a per-entry lock only on cache miss or base change.
Zero-allocation context encode/decode (net core 3.1+): Added PathwayContextEncoder.EncodeInto and a Span<byte>-based Decode overload; DataStreamsContextPropagator uses stackalloc buffers on .NET Core 3.1+ to avoid intermediate byte[] heap allocations on every produce/consume.
Reference-equality dictionary comparers: DataStreamsAggregator and DataStreamsManager._nodeHashCache now use reference-equality comparers backed by RuntimeHelpers.GetHashCode, which is safe because all keys are interned by the caches above.
Drain-signal instead of sleep: Replaced the 10 ms Thread.Sleep polling loop in DataStreamsWriter with a ManualResetEventSlim that wakes immediately when the queue reaches 1 000 items or after a 500 ms timeout, eliminating unnecessary context switches.
Integration-specific cache-key structs: Added readonly struct cache keys (ConsumeEdgeTagCacheKey, ProduceEdgeTagCacheKey, CommitBacklogTagCacheKey, ProduceBacklogTagCacheKey) for Kafka; equivalent structs for AWS SQS/SNS/Kinesis, Azure Service Bus, IBM MQ, and RabbitMQ.
Minor hot-path fix (Kafka): The Remove(TemporaryBase64PathwayContext) header scan is now skipped when KafkaCreateConsumerScopeEnabled=true (the default), avoiding an O(n) scan on every message.
LastConsumePathway guard removed: Dropped the redundant != null guard on the produce path that required an AsyncLocal read before the actual AsyncLocal read.

Reason for change

DSM instrumentation runs on the hot path of every instrumented message. Profiling revealed that the dominant allocations were:

A new string[] edge-tag array on every produce/consume call.
A CalculateNodeHash call (hashing over all edge tags) on every checkpoint.
Intermediate byte[] arrays for pathway context Base64 encoding/decoding.
Unnecessary CPU spin from a fixed 10 ms sleep between drain cycles.

These optimizations target p99 and throughput benchmarks for Kafka, SQS, SNS, RabbitMQ, IBM MQ, Azure Service Bus, and Kinesis instrumentation.

Implementation details

Caching strategy

EdgeTagCache<TKey> and BacklogTagCache<TKey> use the static-generic-class pattern (static class Foo<T> with a static field) to give each integration its own dictionary instance without any runtime dispatch. The key type is a readonly struct implementing IEquatable<TKey>, which prevents boxing in ConcurrentDictionary lookups.

The caches are bounded at MaxEdgeTagCacheSize = 1000 entries. Once that limit is reached, new keys are computed on the fly (no caching) to prevent unbounded memory growth from high-cardinality identifiers.

Node-hash caching

_nodeHashCache is keyed by string[] identity (not value equality) because the arrays themselves are interned by EdgeTagCache<TKey>. Each entry holds a volatile NodeHashSnapshot (nodeHashBase + NodeHash). On every checkpoint:

Look up the array reference — O(1) identity hash.
Read the volatile snapshot — lock-free.
If the base matches, return immediately.
Otherwise, acquire the per-entry lock, double-check, compute, and publish a new snapshot.

Zero-allocation encode/decode

PathwayContextEncoder.EncodeInto(PathwayContext, Span<byte>) writes directly into a caller-supplied buffer. DataStreamsContextPropagator stackallocs MaxEncodedSize (26 bytes) and MaxBase64EncodedSize (36 bytes) on the stack and uses Base64.EncodeToUtf8/DecodeFromUtf8 in-place. The only unavoidable allocation is the final ToArray() passed to headers.Add, because Kafka takes ownership of the byte array.

This path is guarded by #if NETCOREAPP3_1_OR_GREATER; .NET Framework falls back to the original heap-allocating path.

Drain signal

DataStreamsWriter previously slept 10 ms unconditionally between drain iterations, burning CPU and adding ~10 ms latency per batch even under load. The new ManualResetEventSlim is signalled immediately when either queue exceeds DrainThreshold (1 000 items), capping worst-case latency at DrainTimeoutMs (500 ms) while eliminating idle wakeups.

Test coverage

DataStreamsManagerTests: new unit tests verify that GetOrCreateEdgeTags and GetOrCreateBacklogTags return the same array/string reference on repeated calls with the same key, and distinct references for different keys. Tests cover Kafka produce/consume, RabbitMQ produce/consume, and generic key types.
PathwayContextEncoderTests: existing encode/decode round-trip tests pass against the new Span<byte> overloads.
All existing DSM tests continue to pass.

Other details

The MaxEdgeTagCacheSize constant is internal to allow unit tests to verify the overflow/bypass behavior.
No public API surface changes; all new types are internal.
.NET Framework code paths are unchanged — all Span-based optimizations are gated behind #if NETCOREAPP3_1_OR_GREATER.

dd-trace-dotnet-ci-bot · 2026-04-13T21:57:27Z

Execution-Time Benchmarks Report ⏱️

Execution-time results for samples comparing This PR (8450) and master.

✅ No regressions detected - check the details below

Full Metrics Comparison

FakeDbCommand

Metric	Master (Mean ± 95% CI)	Current (Mean ± 95% CI)	Change	Status
.NET Framework 4.8 - Baseline
duration	72.86 ± (72.85 - 73.30) ms	74.22 ± (74.11 - 74.66) ms	+1.9%	✅⬆️
.NET Framework 4.8 - Bailout
duration	76.82 ± (76.86 - 77.20) ms	76.95 ± (76.82 - 77.25) ms	+0.2%	✅⬆️
.NET Framework 4.8 - CallTarget+Inlining+NGEN
duration	1073.84 ± (1074.00 - 1080.56) ms	1077.86 ± (1075.07 - 1081.37) ms	+0.4%	✅⬆️
.NET Core 3.1 - Baseline
process.internal_duration_ms	22.65 ± (22.60 - 22.70) ms	22.83 ± (22.77 - 22.89) ms	+0.8%	✅⬆️
process.time_to_main_ms	85.59 ± (85.37 - 85.82) ms	86.63 ± (86.34 - 86.93) ms	+1.2%	✅⬆️
runtime.dotnet.exceptions.count	0 ± (0 - 0)	0 ± (0 - 0)	+0.0%	✅
runtime.dotnet.mem.committed	10.85 ± (10.84 - 10.85) MB	10.90 ± (10.90 - 10.91) MB	+0.5%	✅⬆️
runtime.dotnet.threads.count	12 ± (12 - 12)	12 ± (12 - 12)	+0.0%	✅
.NET Core 3.1 - Bailout
process.internal_duration_ms	22.38 ± (22.33 - 22.42) ms	22.52 ± (22.49 - 22.56) ms	+0.7%	✅⬆️
process.time_to_main_ms	85.57 ± (85.37 - 85.78) ms	86.57 ± (86.39 - 86.75) ms	+1.2%	✅⬆️
runtime.dotnet.exceptions.count	0 ± (0 - 0)	0 ± (0 - 0)	+0.0%	✅
runtime.dotnet.mem.committed	10.93 ± (10.92 - 10.93) MB	10.94 ± (10.93 - 10.94) MB	+0.1%	✅⬆️
runtime.dotnet.threads.count	13 ± (13 - 13)	13 ± (13 - 13)	+0.0%	✅
.NET Core 3.1 - CallTarget+Inlining+NGEN
process.internal_duration_ms	231.45 ± (230.47 - 232.43) ms	227.37 ± (226.32 - 228.42) ms	-1.8%	✅
process.time_to_main_ms	534.99 ± (533.73 - 536.26) ms	529.13 ± (527.83 - 530.43) ms	-1.1%	✅
runtime.dotnet.exceptions.count	0 ± (0 - 0)	0 ± (0 - 0)	+0.0%	✅
runtime.dotnet.mem.committed	48.47 ± (48.45 - 48.49) MB	48.48 ± (48.45 - 48.51) MB	+0.0%	✅⬆️
runtime.dotnet.threads.count	28 ± (28 - 28)	28 ± (28 - 28)	+0.0%	✅
.NET 6 - Baseline
process.internal_duration_ms	21.11 ± (21.08 - 21.14) ms	21.13 ± (21.09 - 21.16) ms	+0.1%	✅⬆️
process.time_to_main_ms	72.38 ± (72.23 - 72.54) ms	72.80 ± (72.64 - 72.96) ms	+0.6%	✅⬆️
runtime.dotnet.exceptions.count	0 ± (0 - 0)	0 ± (0 - 0)	+0.0%	✅
runtime.dotnet.mem.committed	10.61 ± (10.60 - 10.61) MB	10.64 ± (10.63 - 10.64) MB	+0.3%	✅⬆️
runtime.dotnet.threads.count	10 ± (10 - 10)	10 ± (10 - 10)	+0.0%	✅
.NET 6 - Bailout
process.internal_duration_ms	21.41 ± (21.35 - 21.47) ms	21.36 ± (21.31 - 21.41) ms	-0.2%	✅
process.time_to_main_ms	76.23 ± (75.94 - 76.52) ms	75.39 ± (75.11 - 75.67) ms	-1.1%	✅
runtime.dotnet.exceptions.count	0 ± (0 - 0)	0 ± (0 - 0)	+0.0%	✅
runtime.dotnet.mem.committed	10.72 ± (10.72 - 10.72) MB	10.75 ± (10.74 - 10.75) MB	+0.3%	✅⬆️
runtime.dotnet.threads.count	11 ± (11 - 11)	11 ± (11 - 11)	+0.0%	✅
.NET 6 - CallTarget+Inlining+NGEN
process.internal_duration_ms	384.14 ± (382.21 - 386.08) ms	383.14 ± (380.97 - 385.31) ms	-0.3%	✅
process.time_to_main_ms	526.38 ± (525.11 - 527.66) ms	526.05 ± (524.81 - 527.29) ms	-0.1%	✅
runtime.dotnet.exceptions.count	0 ± (0 - 0)	0 ± (0 - 0)	+0.0%	✅
runtime.dotnet.mem.committed	49.97 ± (49.94 - 50.00) MB	50.07 ± (50.05 - 50.10) MB	+0.2%	✅⬆️
runtime.dotnet.threads.count	28 ± (28 - 28)	28 ± (28 - 28)	-0.0%	✅
.NET 8 - Baseline
process.internal_duration_ms	19.81 ± (19.76 - 19.86) ms	19.79 ± (19.73 - 19.85) ms	-0.1%	✅
process.time_to_main_ms	74.66 ± (74.38 - 74.93) ms	75.05 ± (74.76 - 75.34) ms	+0.5%	✅⬆️
runtime.dotnet.exceptions.count	0 ± (0 - 0)	0 ± (0 - 0)	+0.0%	✅
runtime.dotnet.mem.committed	7.66 ± (7.65 - 7.66) MB	7.66 ± (7.66 - 7.67) MB	+0.1%	✅⬆️
runtime.dotnet.threads.count	10 ± (10 - 10)	10 ± (10 - 10)	+0.0%	✅
.NET 8 - Bailout
process.internal_duration_ms	19.68 ± (19.63 - 19.72) ms	19.59 ± (19.55 - 19.62) ms	-0.5%	✅
process.time_to_main_ms	74.78 ± (74.55 - 75.01) ms	74.51 ± (74.33 - 74.70) ms	-0.4%	✅
runtime.dotnet.exceptions.count	0 ± (0 - 0)	0 ± (0 - 0)	+0.0%	✅
runtime.dotnet.mem.committed	7.71 ± (7.70 - 7.71) MB	7.71 ± (7.71 - 7.72) MB	+0.0%	✅⬆️
runtime.dotnet.threads.count	11 ± (11 - 11)	11 ± (11 - 11)	+0.0%	✅
.NET 8 - CallTarget+Inlining+NGEN
process.internal_duration_ms	305.33 ± (302.94 - 307.71) ms	304.79 ± (302.52 - 307.07) ms	-0.2%	✅
process.time_to_main_ms	489.64 ± (488.44 - 490.85) ms	492.29 ± (490.98 - 493.59) ms	+0.5%	✅⬆️
runtime.dotnet.exceptions.count	0 ± (0 - 0)	0 ± (0 - 0)	+0.0%	✅
runtime.dotnet.mem.committed	37.11 ± (37.09 - 37.13) MB	37.07 ± (37.04 - 37.09) MB	-0.1%	✅
runtime.dotnet.threads.count	27 ± (27 - 27)	27 ± (27 - 27)	-0.2%	✅

HttpMessageHandler

Metric	Master (Mean ± 95% CI)	Current (Mean ± 95% CI)	Change	Status
.NET Framework 4.8 - Baseline
duration	210.30 ± (209.91 - 210.98) ms	208.50 ± (208.37 - 209.31) ms	-0.9%	✅
.NET Framework 4.8 - Bailout
duration	212.75 ± (212.35 - 212.99) ms	214.04 ± (213.56 - 214.50) ms	+0.6%	✅⬆️
.NET Framework 4.8 - CallTarget+Inlining+NGEN
duration	1226.85 ± (1225.90 - 1232.34) ms	1225.70 ± (1226.14 - 1232.53) ms	-0.1%	✅
.NET Core 3.1 - Baseline
process.internal_duration_ms	204.85 ± (204.41 - 205.29) ms	204.20 ± (203.69 - 204.71) ms	-0.3%	✅
process.time_to_main_ms	90.21 ± (89.95 - 90.48) ms	89.72 ± (89.39 - 90.04) ms	-0.6%	✅
runtime.dotnet.exceptions.count	3 ± (3 - 3)	3 ± (3 - 3)	+0.0%	✅
runtime.dotnet.mem.committed	15.96 ± (15.94 - 15.98) MB	15.87 ± (15.86 - 15.89) MB	-0.5%	✅
runtime.dotnet.threads.count	20 ± (20 - 20)	20 ± (20 - 20)	+0.1%	✅⬆️
.NET Core 3.1 - Bailout
process.internal_duration_ms	203.59 ± (203.14 - 204.04) ms	204.15 ± (203.69 - 204.60) ms	+0.3%	✅⬆️
process.time_to_main_ms	91.09 ± (90.81 - 91.37) ms	91.24 ± (90.93 - 91.54) ms	+0.2%	✅⬆️
runtime.dotnet.exceptions.count	3 ± (3 - 3)	3 ± (3 - 3)	+0.0%	✅
runtime.dotnet.mem.committed	15.93 ± (15.91 - 15.96) MB	15.93 ± (15.91 - 15.95) MB	-0.1%	✅
runtime.dotnet.threads.count	21 ± (21 - 21)	21 ± (21 - 21)	+1.1%	✅⬆️
.NET Core 3.1 - CallTarget+Inlining+NGEN
process.internal_duration_ms	417.13 ± (415.47 - 418.79) ms	419.26 ± (418.00 - 420.53) ms	+0.5%	✅⬆️
process.time_to_main_ms	548.87 ± (547.58 - 550.17) ms	553.85 ± (552.48 - 555.21) ms	+0.9%	✅⬆️
runtime.dotnet.exceptions.count	3 ± (3 - 3)	3 ± (3 - 3)	+0.0%	✅
runtime.dotnet.mem.committed	59.32 ± (59.27 - 59.38) MB	59.31 ± (59.27 - 59.35) MB	-0.0%	✅
runtime.dotnet.threads.count	30 ± (30 - 30)	30 ± (30 - 30)	+0.1%	✅⬆️
.NET 6 - Baseline
process.internal_duration_ms	209.69 ± (209.24 - 210.15) ms	211.11 ± (210.67 - 211.55) ms	+0.7%	✅⬆️
process.time_to_main_ms	78.36 ± (78.12 - 78.61) ms	78.83 ± (78.54 - 79.12) ms	+0.6%	✅⬆️
runtime.dotnet.exceptions.count	4 ± (4 - 4)	4 ± (4 - 4)	+0.0%	✅
runtime.dotnet.mem.committed	16.18 ± (16.16 - 16.19) MB	16.21 ± (16.19 - 16.23) MB	+0.2%	✅⬆️
runtime.dotnet.threads.count	19 ± (19 - 20)	20 ± (19 - 20)	+0.4%	✅⬆️
.NET 6 - Bailout
process.internal_duration_ms	209.39 ± (208.93 - 209.84) ms	210.78 ± (210.35 - 211.21) ms	+0.7%	✅⬆️
process.time_to_main_ms	79.16 ± (78.91 - 79.41) ms	80.10 ± (79.88 - 80.32) ms	+1.2%	✅⬆️
runtime.dotnet.exceptions.count	4 ± (4 - 4)	4 ± (4 - 4)	+0.0%	✅
runtime.dotnet.mem.committed	16.22 ± (16.20 - 16.25) MB	16.24 ± (16.21 - 16.26) MB	+0.1%	✅⬆️
runtime.dotnet.threads.count	20 ± (20 - 21)	20 ± (20 - 21)	-0.1%	✅
.NET 6 - CallTarget+Inlining+NGEN
process.internal_duration_ms	605.01 ± (602.63 - 607.39) ms	602.08 ± (599.11 - 605.05) ms	-0.5%	✅
process.time_to_main_ms	547.65 ± (546.52 - 548.78) ms	547.68 ± (546.32 - 549.05) ms	+0.0%	✅⬆️
runtime.dotnet.exceptions.count	4 ± (4 - 4)	4 ± (4 - 4)	+0.0%	✅
runtime.dotnet.mem.committed	61.71 ± (61.62 - 61.80) MB	61.84 ± (61.73 - 61.95) MB	+0.2%	✅⬆️
runtime.dotnet.threads.count	31 ± (31 - 31)	31 ± (31 - 31)	-0.1%	✅
.NET 8 - Baseline
process.internal_duration_ms	208.34 ± (207.85 - 208.83) ms	207.61 ± (207.13 - 208.10) ms	-0.3%	✅
process.time_to_main_ms	77.73 ± (77.47 - 77.99) ms	77.15 ± (76.91 - 77.39) ms	-0.7%	✅
runtime.dotnet.exceptions.count	4 ± (4 - 4)	4 ± (4 - 4)	+0.0%	✅
runtime.dotnet.mem.committed	11.58 ± (11.56 - 11.60) MB	11.60 ± (11.58 - 11.62) MB	+0.2%	✅⬆️
runtime.dotnet.threads.count	19 ± (19 - 19)	19 ± (19 - 19)	+0.1%	✅⬆️
.NET 8 - Bailout
process.internal_duration_ms	208.57 ± (208.09 - 209.06) ms	207.05 ± (206.68 - 207.43) ms	-0.7%	✅
process.time_to_main_ms	78.93 ± (78.76 - 79.11) ms	78.73 ± (78.54 - 78.93) ms	-0.3%	✅
runtime.dotnet.exceptions.count	4 ± (4 - 4)	4 ± (4 - 4)	+0.0%	✅
runtime.dotnet.mem.committed	11.63 ± (11.61 - 11.65) MB	11.69 ± (11.67 - 11.71) MB	+0.5%	✅⬆️
runtime.dotnet.threads.count	20 ± (20 - 20)	20 ± (20 - 20)	+0.2%	✅⬆️
.NET 8 - CallTarget+Inlining+NGEN
process.internal_duration_ms	533.96 ± (527.04 - 540.87) ms	536.22 ± (530.39 - 542.04) ms	+0.4%	✅⬆️
process.time_to_main_ms	509.33 ± (508.51 - 510.15) ms	510.32 ± (509.35 - 511.29) ms	+0.2%	✅⬆️
runtime.dotnet.exceptions.count	4 ± (4 - 4)	4 ± (4 - 4)	+0.0%	✅
runtime.dotnet.mem.committed	50.84 ± (50.76 - 50.92) MB	50.80 ± (50.73 - 50.86) MB	-0.1%	✅
runtime.dotnet.threads.count	30 ± (30 - 30)	30 ± (30 - 30)	+0.5%	✅⬆️

Comparison explanation

Execution-time benchmarks measure the whole time it takes to execute a program, and are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are highlighted in **red**. The following thresholds were used for comparing the execution times:

Welch test with statistical test for significance of 5%
Only results indicating a difference greater than 5% and 5 ms are considered.

Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard.

Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph).

Duration charts

FakeDbCommand (.NET Framework 4.8)

gantt
    title Execution time (ms) FakeDbCommand (.NET Framework 4.8)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (8450) - mean (74ms)  : 70, 79
    master - mean (73ms)  : 70, 76

    section Bailout
    This PR (8450) - mean (77ms)  : 75, 79
    master - mean (77ms)  : 75, 79

    section CallTarget+Inlining+NGEN
    This PR (8450) - mean (1,078ms)  : 1033, 1123
    master - mean (1,077ms)  : 1031, 1124

FakeDbCommand (.NET Core 3.1)

gantt
    title Execution time (ms) FakeDbCommand (.NET Core 3.1)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (8450) - mean (117ms)  : 111, 122
    master - mean (115ms)  : 110, 120

    section Bailout
    This PR (8450) - mean (116ms)  : 114, 118
    master - mean (115ms)  : 112, 118

    section CallTarget+Inlining+NGEN
    This PR (8450) - mean (796ms)  : 772, 820
    master - mean (804ms)  : 776, 831

FakeDbCommand (.NET 6)

gantt
    title Execution time (ms) FakeDbCommand (.NET 6)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (8450) - mean (100ms)  : 96, 104
    master - mean (100ms)  : 97, 103

    section Bailout
    This PR (8450) - mean (103ms)  : 98, 108
    master - mean (104ms)  : 99, 109

    section CallTarget+Inlining+NGEN
    This PR (8450) - mean (937ms)  : 900, 975
    master - mean (938ms)  : 901, 975

FakeDbCommand (.NET 8)

gantt
    title Execution time (ms) FakeDbCommand (.NET 8)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (8450) - mean (103ms)  : 96, 109
    master - mean (102ms)  : 97, 108

    section Bailout
    This PR (8450) - mean (102ms)  : 99, 105
    master - mean (103ms)  : 97, 108

    section CallTarget+Inlining+NGEN
    This PR (8450) - mean (829ms)  : 784, 874
    master - mean (827ms)  : 792, 863

HttpMessageHandler (.NET Framework 4.8)

gantt
    title Execution time (ms) HttpMessageHandler (.NET Framework 4.8)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (8450) - mean (209ms)  : 204, 214
    master - mean (210ms)  : 204, 216

    section Bailout
    This PR (8450) - mean (214ms)  : 209, 219
    master - mean (213ms)  : 210, 216

    section CallTarget+Inlining+NGEN
    This PR (8450) - mean (1,229ms)  : 1184, 1275
    master - mean (1,229ms)  : 1181, 1277

HttpMessageHandler (.NET Core 3.1)

gantt
    title Execution time (ms) HttpMessageHandler (.NET Core 3.1)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (8450) - mean (303ms)  : 295, 311
    master - mean (304ms)  : 296, 312

    section Bailout
    This PR (8450) - mean (305ms)  : 297, 312
    master - mean (304ms)  : 297, 311

    section CallTarget+Inlining+NGEN
    This PR (8450) - mean (1,012ms)  : 985, 1040
    master - mean (1,006ms)  : 979, 1033

HttpMessageHandler (.NET 6)

gantt
    title Execution time (ms) HttpMessageHandler (.NET 6)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (8450) - mean (299ms)  : 292, 307
    master - mean (298ms)  : 287, 308

    section Bailout
    This PR (8450) - mean (299ms)  : 293, 306
    master - mean (297ms)  : 291, 304

    section CallTarget+Inlining+NGEN
    This PR (8450) - mean (1,180ms)  : 1147, 1213
    master - mean (1,183ms)  : 1145, 1221

HttpMessageHandler (.NET 8)

gantt
    title Execution time (ms) HttpMessageHandler (.NET 8)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (8450) - mean (296ms)  : 287, 305
    master - mean (297ms)  : 291, 303

    section Bailout
    This PR (8450) - mean (296ms)  : 291, 301
    master - mean (299ms)  : 288, 310

    section CallTarget+Inlining+NGEN
    This PR (8450) - mean (1,076ms)  : 994, 1159
    master - mean (1,077ms)  : 977, 1178

pr-commenter · 2026-04-14T15:32:25Z

Benchmarks

Benchmark execution time: 2026-04-16 20:08:58

Comparing candidate commit bb42028 in PR branch kr-igor/dsm-overhead-optimizations with baseline commit afeeb86 in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 27 metrics, 0 unstable metrics, 87 known flaky benchmarks.

Explanation

This is an A/B test comparing a candidate commit's performance against that of a baseline commit. Performance changes are noted in the tables below as:

🟩 = significantly better candidate vs. baseline
🟥 = significantly worse candidate vs. baseline

We compute a confidence interval (CI) over the relative difference of means between metrics from the candidate and baseline commits, considering the baseline as the reference.

If the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD), the change is considered significant.

Feel free to reach out to #apm-benchmarking-platform on Slack if you have any questions.

More details about the CI and significant changes

You can imagine this CI as a range of values that is likely to contain the true difference of means between the candidate and baseline commits.

CIs of the difference of means are often centered around 0%, because often changes are not that big:

---------------------------------(------|---^--------)-------------------------------->
                              -0.6%    0%  0.3%     +1.2%
                                 |          |        |
         lower bound of the CI --'          |        |
sample mean (center of the CI) -------------'        |
         upper bound of the CI ----------------------'

As described above, a change is considered significant if the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD).

For instance, for an execution time metric, this confidence interval indicates a significantly worse performance:

----------------------------------------|---------|---(---------^---------)---------->
                                       0%        1%  1.3%      2.2%      3.1%
                                                  |   |         |         |
       significant impact threshold --------------'   |         |         |
                      lower bound of CI --------------'         |         |
       sample mean (center of the CI) --------------------------'         |
                      upper bound of CI ----------------------------------'

Known flaky benchmarks

These benchmarks are marked as flaky and will not trigger a failure. Modify FLAKY_BENCHMARKS_REGEX to control which benchmarks are marked as flaky.

scenario:Benchmarks.Trace.ActivityBenchmark.StartStopWithChild net472

ignore allocated_mem [+0 bytes; +0 bytes] or [-0.008%; +0.005%]
ignore execution_time [-1389.605µs; -428.532µs] or [-0.690%; -0.213%]
ignore throughput [-2083.182op/s; -1609.057op/s] or [-2.470%; -1.908%]

scenario:Benchmarks.Trace.ActivityBenchmark.StartStopWithChild net6.0

ignore allocated_mem [+0 bytes; +0 bytes] or [-0.006%; +0.007%]
ignore execution_time [-1307.486µs; +2242.204µs] or [-0.652%; +1.119%]
🟩 throughput [+9523.394op/s; +11887.300op/s] or [+8.005%; +9.992%]

scenario:Benchmarks.Trace.ActivityBenchmark.StartStopWithChild netcoreapp3.1

ignore allocated_mem [+0 bytes; +0 bytes] or [-0.003%; +0.007%]
ignore execution_time [-492.066µs; +1784.409µs] or [-0.247%; +0.897%]
ignore throughput [-3049.852op/s; -1883.953op/s] or [-3.101%; -1.916%]

scenario:Benchmarks.Trace.AgentWriterBenchmark.WriteAndFlushEnrichedTraces net472

ignore allocated_mem [+8 bytes; +9 bytes] or [+0.266%; +0.279%]
🟥 execution_time [+310.807ms; +315.761ms] or [+154.234%; +156.692%]
ignore throughput [+17.231op/s; +21.047op/s] or [+3.100%; +3.787%]

scenario:Benchmarks.Trace.AgentWriterBenchmark.WriteAndFlushEnrichedTraces net6.0

ignore allocated_mem [+0 bytes; +0 bytes] or [-0.009%; +0.002%]
🟥 execution_time [+381.450ms; +383.907ms] or [+301.369%; +303.310%]
ignore throughput [+12.357op/s; +16.470op/s] or [+1.629%; +2.172%]

scenario:Benchmarks.Trace.AgentWriterBenchmark.WriteAndFlushEnrichedTraces netcoreapp3.1

ignore allocated_mem [+0 bytes; +0 bytes] or [-0.009%; +0.002%]
🟥 execution_time [+390.787ms; +393.049ms] or [+345.832%; +347.833%]
ignore throughput [-6.446op/s; -2.569op/s] or [-0.911%; -0.363%]

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleMoreComplexBody net472

🟥 allocated_mem [+1.308KB; +1.308KB] or [+27.529%; +27.541%]
ignore execution_time [-259.993µs; +316.557µs] or [-0.130%; +0.158%]
ignore throughput [-4306.794op/s; -3910.962op/s] or [-3.351%; -3.043%]

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleMoreComplexBody net6.0

🟥 allocated_mem [+471 bytes; +472 bytes] or [+9.977%; +9.987%]
ignore execution_time [-16.106ms; -8.061ms] or [-7.522%; -3.765%]
ignore throughput [+4879.940op/s; +8724.275op/s] or [+3.562%; +6.368%]

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleMoreComplexBody netcoreapp3.1

🟥 allocated_mem [+1.272KB; +1.272KB] or [+27.502%; +27.510%]
ignore execution_time [-12.740ms; -8.601ms] or [-6.067%; -4.096%]
ignore throughput [-487.361op/s; +1789.733op/s] or [-0.441%; +1.618%]

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleSimpleBody net472

🟥 allocated_mem [+1.307KB; +1.307KB] or [+105.746%; +105.759%]
ignore execution_time [-895.329µs; -365.985µs] or [-0.446%; -0.182%]
🟥 throughput [-257685.433op/s; -254598.965op/s] or [-26.311%; -25.996%]

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleSimpleBody net6.0

🟥 allocated_mem [+471 bytes; +472 bytes] or [+38.558%; +38.566%]
🟩 execution_time [-26.576ms; -21.281ms] or [-11.852%; -9.490%]
ignore throughput [-60910.233op/s; -36745.783op/s] or [-6.507%; -3.926%]

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleSimpleBody netcoreapp3.1

🟥 allocated_mem [+1.272KB; +1.272KB] or [+105.292%; +105.304%]
ignore execution_time [-3.717ms; +0.601ms] or [-1.855%; +0.300%]
🟥 throughput [-129785.694op/s; -113718.299op/s] or [-18.648%; -16.339%]

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.ObjectExtractorMoreComplexBody net472

ignore allocated_mem [+0 bytes; +0 bytes] or [-0.007%; +0.003%]
ignore execution_time [-1025.756µs; +1.807µs] or [-0.511%; +0.001%]
ignore throughput [+449.567op/s; +1233.983op/s] or [+0.303%; +0.830%]

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.ObjectExtractorMoreComplexBody net6.0

ignore allocated_mem [+0 bytes; +0 bytes] or [-0.004%; +0.003%]
ignore execution_time [+0.803ms; +4.316ms] or [+0.405%; +2.178%]
🟩 throughput [+10366.761op/s; +13274.650op/s] or [+6.596%; +8.446%]

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.ObjectExtractorMoreComplexBody netcoreapp3.1

ignore allocated_mem [+0 bytes; +0 bytes] or [-0.007%; +0.003%]
ignore execution_time [+2.183ms; +6.181ms] or [+1.113%; +3.151%]
ignore throughput [+6075.945op/s; +8718.176op/s] or [+4.840%; +6.945%]

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.ObjectExtractorSimpleBody net472

ignore allocated_mem [+0 bytes; +0 bytes] or [-0.008%; +0.004%]
ignore execution_time [-296.699µs; -23.778µs] or [-0.148%; -0.012%]
ignore throughput [+51781.812op/s; +64766.568op/s] or [+1.575%; +1.970%]

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.ObjectExtractorSimpleBody net6.0

ignore allocated_mem [+0 bytes; +0 bytes] or [-0.006%; +0.008%]
ignore execution_time [-1.962ms; -1.205ms] or [-0.970%; -0.596%]
🟩 throughput [+486994.510op/s; +505880.960op/s] or [+16.238%; +16.868%]

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.ObjectExtractorSimpleBody netcoreapp3.1

ignore allocated_mem [+0 bytes; +0 bytes] or [-0.004%; +0.004%]
🟩 execution_time [-19.477ms; -15.097ms] or [-8.978%; -6.959%]
🟩 throughput [+216810.866op/s; +270215.239op/s] or [+8.606%; +10.726%]

scenario:Benchmarks.Trace.Asm.AppSecEncoderBenchmark.EncodeArgs net472

ignore allocated_mem [+1 bytes; +4 bytes] or [+0.004%; +0.012%]
🟥 execution_time [+299.599ms; +300.069ms] or [+149.700%; +149.934%]
ignore throughput [+94.010op/s; +109.745op/s] or [+1.038%; +1.212%]

scenario:Benchmarks.Trace.Asm.AppSecEncoderBenchmark.EncodeArgs net6.0

ignore allocated_mem [-1 bytes; +2 bytes] or [-0.004%; +0.008%]
🟥 execution_time [+299.904ms; +303.059ms] or [+151.242%; +152.833%]
ignore throughput [+349.746op/s; +559.991op/s] or [+2.675%; +4.283%]

scenario:Benchmarks.Trace.Asm.AppSecEncoderBenchmark.EncodeArgs netcoreapp3.1

ignore allocated_mem [-1 bytes; +2 bytes] or [-0.004%; +0.008%]
🟥 execution_time [+299.585ms; +301.963ms] or [+150.908%; +152.106%]
ignore throughput [+86.392op/s; +213.111op/s] or [+0.834%; +2.057%]

scenario:Benchmarks.Trace.Asm.AppSecEncoderBenchmark.EncodeLegacyArgs net472

ignore allocated_mem [+2 bytes; +3 bytes] or [+0.137%; +0.150%]
🟥 execution_time [+297.389ms; +298.046ms] or [+146.066%; +146.388%]
ignore throughput [+12.941op/s; +19.731op/s] or [+0.343%; +0.523%]

scenario:Benchmarks.Trace.Asm.AppSecEncoderBenchmark.EncodeLegacyArgs net6.0

ignore allocated_mem [+0 bytes; +0 bytes] or [-0.004%; +0.009%]
🟥 execution_time [+297.036ms; +299.998ms] or [+145.210%; +146.658%]
ignore throughput [+101.206op/s; +148.298op/s] or [+1.470%; +2.154%]

scenario:Benchmarks.Trace.Asm.AppSecEncoderBenchmark.EncodeLegacyArgs netcoreapp3.1

ignore allocated_mem [+0 bytes; +0 bytes] or [-0.004%; +0.009%]
🟥 execution_time [+302.580ms; +303.539ms] or [+151.229%; +151.709%]
ignore throughput [+18.694op/s; +37.035op/s] or [+0.371%; +0.735%]

scenario:Benchmarks.Trace.Asm.AppSecWafBenchmark.RunWafRealisticBenchmark net472

ignore allocated_mem [+0 bytes; +0 bytes] or [+nan%; +nan%]
ignore execution_time [+1.724µs; +5.344µs] or [+0.354%; +1.097%]
ignore throughput [-22.072op/s; -7.083op/s] or [-1.075%; -0.345%]

scenario:Benchmarks.Trace.Asm.AppSecWafBenchmark.RunWafRealisticBenchmark net6.0

ignore allocated_mem [+0 bytes; +0 bytes] or [-0.000%; +0.010%]
🟥 execution_time [+22.133µs; +51.086µs] or [+5.076%; +11.717%]
ignore throughput [-240.941op/s; -114.526op/s] or [-10.475%; -4.979%]

scenario:Benchmarks.Trace.Asm.AppSecWafBenchmark.RunWafRealisticBenchmark netcoreapp3.1

ignore allocated_mem [+0 bytes; +0 bytes] or [-0.003%; +0.003%]
ignore execution_time [+7.616µs; +29.596µs] or [+1.632%; +6.341%]
ignore throughput [-146.059op/s; -65.209op/s] or [-6.742%; -3.010%]

scenario:Benchmarks.Trace.Asm.AppSecWafBenchmark.RunWafRealisticBenchmarkWithAttack net472

ignore allocated_mem [+0 bytes; +0 bytes] or [+nan%; +nan%]
ignore execution_time [-2488.260ns; +1952.260ns] or [-0.672%; +0.527%]
ignore throughput [-13.096op/s; +18.802op/s] or [-0.485%; +0.696%]

scenario:Benchmarks.Trace.Asm.AppSecWafBenchmark.RunWafRealisticBenchmarkWithAttack net6.0

ignore allocated_mem [+0 bytes; +0 bytes] or [-0.003%; +0.007%]
🟥 execution_time [+23.443µs; +47.054µs] or [+7.484%; +15.022%]
🟥 throughput [-436.984op/s; -238.306op/s] or [-13.622%; -7.429%]

scenario:Benchmarks.Trace.Asm.AppSecWafBenchmark.RunWafRealisticBenchmarkWithAttack netcoreapp3.1

ignore allocated_mem [+0 bytes; +0 bytes] or [-0.003%; +0.003%]
ignore execution_time [-13.657µs; +8.735µs] or [-3.736%; +2.390%]
ignore throughput [-98.784op/s; +34.894op/s] or [-3.545%; +1.252%]

scenario:Benchmarks.Trace.AspNetCoreBenchmark.SendRequest net472

ignore allocated_mem [+0 bytes; +0 bytes] or [+nan%; +nan%]
🟥 execution_time [+299.366ms; +300.071ms] or [+149.415%; +149.766%]
ignore throughput [-1586436.181op/s; -1014656.155op/s] or [-0.795%; -0.508%]

scenario:Benchmarks.Trace.AspNetCoreBenchmark.SendRequest net6.0

ignore allocated_mem [+119 bytes; +121 bytes] or [+0.666%; +0.677%]
unstable execution_time [+370.819ms; +399.826ms] or [+402.910%; +434.427%]
🟩 throughput [+1049.329op/s; +1194.676op/s] or [+8.622%; +9.817%]

scenario:Benchmarks.Trace.AspNetCoreBenchmark.SendRequest netcoreapp3.1

ignore allocated_mem [+20 bytes; +22 bytes] or [+0.099%; +0.110%]
unstable execution_time [+334.112ms; +363.048ms] or [+253.688%; +275.659%]
🟩 throughput [+682.318op/s; +882.855op/s] or [+6.605%; +8.547%]

scenario:Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark.WriteAndFlushEnrichedTraces net472

ignore allocated_mem [+2.762KB; +2.767KB] or [+4.908%; +4.916%]
unstable execution_time [+345.939ms; +422.584ms] or [+159.059%; +194.300%]
🟥 throughput [-577.989op/s; -515.687op/s] or [-52.371%; -46.726%]

scenario:Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark.WriteAndFlushEnrichedTraces net6.0

ignore allocated_mem [-1.270KB; -1.268KB] or [-2.995%; -2.990%]
unstable execution_time [+201.544ms; +334.764ms] or [+85.890%; +142.662%]
🟥 throughput [-742.197op/s; -658.706op/s] or [-49.505%; -43.936%]

scenario:Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark.WriteAndFlushEnrichedTraces netcoreapp3.1

🟥 allocated_mem [+2.186KB; +2.189KB] or [+5.162%; +5.169%]
unstable execution_time [+309.009ms; +334.858ms] or [+184.823%; +200.284%]
🟥 throughput [-405.719op/s; -370.730op/s] or [-28.250%; -25.813%]

scenario:Benchmarks.Trace.CharSliceBenchmark.OptimizedCharSlice net472

ignore allocated_mem [+0 bytes; +0 bytes] or [+nan%; +nan%]
ignore execution_time [-70.308µs; -51.484µs] or [-3.538%; -2.591%]
ignore throughput [+13.693op/s; +18.735op/s] or [+2.721%; +3.723%]

scenario:Benchmarks.Trace.CharSliceBenchmark.OptimizedCharSlice net6.0

ignore allocated_mem [+0 bytes; +0 bytes] or [+nan%; +nan%]
ignore execution_time [-52.286µs; -36.330µs] or [-3.592%; -2.496%]
ignore throughput [+18.166op/s; +25.993op/s] or [+2.644%; +3.784%]

scenario:Benchmarks.Trace.CharSliceBenchmark.OptimizedCharSlice netcoreapp3.1

ignore allocated_mem [+0 bytes; +0 bytes] or [+nan%; +nan%]
ignore execution_time [-116.031µs; -107.455µs] or [-4.037%; -3.738%]
ignore throughput [+13.538op/s; +14.642op/s] or [+3.891%; +4.209%]

scenario:Benchmarks.Trace.CharSliceBenchmark.OptimizedCharSliceWithPool net472

ignore allocated_mem [+0 bytes; +0 bytes] or [+nan%; +nan%]
ignore execution_time [+4.223µs; +16.858µs] or [+0.365%; +1.456%]
ignore throughput [-11.753op/s; -2.651op/s] or [-1.361%; -0.307%]

scenario:Benchmarks.Trace.CharSliceBenchmark.OptimizedCharSliceWithPool net6.0

ignore allocated_mem [+0 bytes; +0 bytes] or [+nan%; +nan%]
ignore execution_time [-1.596µs; +19.612µs] or [-0.148%; +1.819%]
ignore throughput [-14.864op/s; +3.963op/s] or [-1.603%; +0.427%]

scenario:Benchmarks.Trace.CharSliceBenchmark.OptimizedCharSliceWithPool netcoreapp3.1

ignore allocated_mem [+0 bytes; +0 bytes] or [+nan%; +nan%]
ignore execution_time [+6.429µs; +15.976µs] or [+0.344%; +0.856%]
ignore throughput [-4.524op/s; -1.797op/s] or [-0.844%; -0.335%]

scenario:Benchmarks.Trace.CharSliceBenchmark.OriginalCharSlice net472

ignore allocated_mem [-43 bytes; +21 bytes] or [-0.007%; +0.003%]
ignore execution_time [+13.418µs; +26.419µs] or [+0.524%; +1.032%]
ignore throughput [-3.936op/s; -1.976op/s] or [-1.008%; -0.506%]

scenario:Benchmarks.Trace.CharSliceBenchmark.OriginalCharSlice net6.0

ignore allocated_mem [-38 bytes; +46 bytes] or [-0.006%; +0.007%]
🟩 execution_time [-155.371µs; -110.916µs] or [-7.871%; -5.619%]
🟩 throughput [+32.578op/s; +44.320op/s] or [+6.431%; +8.749%]

scenario:Benchmarks.Trace.CharSliceBenchmark.OriginalCharSlice netcoreapp3.1

ignore allocated_mem [-42 bytes; +23 bytes] or [-0.007%; +0.004%]
ignore execution_time [-142.388µs; -100.414µs] or [-3.611%; -2.546%]
ignore throughput [+6.895op/s; +9.593op/s] or [+2.719%; +3.783%]

scenario:Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearch net472

ignore allocated_mem [+0 bytes; +0 bytes] or [-0.001%; +0.008%]
🟥 execution_time [+302.200ms; +303.614ms] or [+152.182%; +152.894%]
ignore throughput [+9970.815op/s; +11652.440op/s] or [+3.208%; +3.749%]

scenario:Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearch net6.0

ignore allocated_mem [+0 bytes; +0 bytes] or [-0.006%; +0.004%]
🟥 execution_time [+300.727ms; +302.540ms] or [+150.695%; +151.603%]
ignore throughput [+26671.510op/s; +30630.733op/s] or [+4.205%; +4.829%]

scenario:Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearch netcoreapp3.1

ignore allocated_mem [+0 bytes; +0 bytes] or [-0.006%; +0.004%]
🟥 execution_time [+300.910ms; +304.033ms] or [+151.164%; +152.733%]
ignore throughput [+13556.193op/s; +20945.333op/s] or [+2.856%; +4.412%]

scenario:Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearchAsync net472

ignore allocated_mem [+0 bytes; +0 bytes] or [-0.008%; +0.004%]
🟥 execution_time [+300.678ms; +302.120ms] or [+150.991%; +151.715%]
ignore throughput [+10628.736op/s; +12264.008op/s] or [+3.561%; +4.109%]

scenario:Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearchAsync net6.0

ignore allocated_mem [+0 bytes; +0 bytes] or [-0.007%; +0.006%]
🟥 execution_time [+300.626ms; +302.328ms] or [+148.646%; +149.488%]
ignore throughput [+1945.267op/s; +7174.974op/s] or [+0.313%; +1.156%]

scenario:Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearchAsync netcoreapp3.1

ignore allocated_mem [+0 bytes; +0 bytes] or [-0.006%; +0.004%]
🟥 execution_time [+303.722ms; +307.179ms] or [+153.940%; +155.692%]
ignore throughput [+6022.905op/s; +14223.457op/s] or [+1.301%; +3.071%]

scenario:Benchmarks.Trace.GraphQLBenchmark.ExecuteAsync net472

ignore allocated_mem [+0 bytes; +1 bytes] or [+0.108%; +0.119%]
🟥 execution_time [+303.462ms; +305.082ms] or [+152.310%; +153.124%]
ignore throughput [+3595.259op/s; +5755.520op/s] or [+0.933%; +1.493%]

scenario:Benchmarks.Trace.GraphQLBenchmark.ExecuteAsync net6.0

ignore allocated_mem [+0 bytes; +0 bytes] or [-0.006%; +0.006%]
🟥 execution_time [+298.307ms; +300.656ms] or [+148.679%; +149.849%]
🟩 throughput [+65246.432op/s; +70234.274op/s] or [+12.956%; +13.946%]

scenario:Benchmarks.Trace.GraphQLBenchmark.ExecuteAsync netcoreapp3.1

ignore allocated_mem [+0 bytes; +0 bytes] or [-0.006%; +0.006%]
🟥 execution_time [+298.927ms; +301.526ms] or [+148.713%; +150.007%]
ignore throughput [-15689.913op/s; -10130.360op/s] or [-3.714%; -2.398%]

scenario:Benchmarks.Trace.ILoggerBenchmark.EnrichedLog net472

ignore allocated_mem [+0 bytes; +0 bytes] or [-0.003%; +0.006%]
ignore execution_time [-598.213µs; +172.729µs] or [-0.297%; +0.086%]
ignore throughput [-1447.039op/s; -88.256op/s] or [-0.582%; -0.035%]

scenario:Benchmarks.Trace.ILoggerBenchmark.EnrichedLog net6.0

ignore allocated_mem [+0 bytes; +0 bytes] or [-0.005%; +0.003%]
🟩 execution_time [-15.843ms; -12.154ms] or [-7.367%; -5.652%]
🟩 throughput [+26821.143op/s; +33554.260op/s] or [+7.358%; +9.205%]

scenario:Benchmarks.Trace.ILoggerBenchmark.EnrichedLog netcoreapp3.1

ignore allocated_mem [+0 bytes; +0 bytes] or [-0.004%; +0.008%]
ignore execution_time [-0.786ms; +3.156ms] or [-0.394%; +1.583%]
ignore throughput [+3033.994op/s; +8815.353op/s] or [+1.107%; +3.218%]

scenario:Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatAspectBenchmark net472

ignore allocated_mem [-4.459KB; -4.431KB] or [-1.623%; -1.613%]
unstable execution_time [+7.955µs; +48.779µs] or [+1.965%; +12.049%]
ignore throughput [-253.943op/s; -48.023op/s] or [-10.219%; -1.932%]

scenario:Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatAspectBenchmark net6.0

🟩 allocated_mem [-18.690KB; -18.669KB] or [-6.818%; -6.810%]
unstable execution_time [-53.113µs; -0.910µs] or [-10.497%; -0.180%]
ignore throughput [+17.568op/s; +200.481op/s] or [+0.877%; +10.004%]

scenario:Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatAspectBenchmark netcoreapp3.1

ignore allocated_mem [-3.159KB; -3.142KB] or [-1.152%; -1.145%]
ignore execution_time [-38.479µs; +18.484µs] or [-6.668%; +3.203%]
ignore throughput [-44.415op/s; +109.308op/s] or [-2.538%; +6.245%]

scenario:Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatBenchmark net472

ignore allocated_mem [-2 bytes; +2 bytes] or [-0.005%; +0.006%]
ignore execution_time [-200.380ns; +1301.047ns] or [-0.347%; +2.254%]
ignore throughput [-357.634op/s; +64.928op/s] or [-2.064%; +0.375%]

scenario:Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatBenchmark net6.0

ignore allocated_mem [-4 bytes; +0 bytes] or [-0.010%; -0.001%]
🟥 execution_time [+6.172µs; +9.981µs] or [+14.589%; +23.591%]
🟥 throughput [-4641.321op/s; -2911.287op/s] or [-19.539%; -12.256%]

scenario:Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatBenchmark netcoreapp3.1

ignore allocated_mem [-1 bytes; +1 bytes] or [-0.002%; +0.002%]
unstable execution_time [-12.706µs; -5.046µs] or [-19.713%; -7.829%]
🟩 throughput [+1353.912op/s; +2973.305op/s] or [+8.307%; +18.242%]

scenario:Benchmarks.Trace.Log4netBenchmark.EnrichedLog net472

ignore allocated_mem [+1 bytes; +2 bytes] or [+0.039%; +0.050%]
🟥 execution_time [+302.803ms; +304.141ms] or [+153.053%; +153.730%]
ignore throughput [-122.382op/s; -98.703op/s] or [-2.045%; -1.649%]

scenario:Benchmarks.Trace.Log4netBenchmark.EnrichedLog net6.0

ignore allocated_mem [+0 bytes; +1 bytes] or [+0.017%; +0.027%]
🟥 execution_time [+303.743ms; +306.559ms] or [+154.604%; +156.038%]
ignore throughput [-98.830op/s; -18.448op/s] or [-1.226%; -0.229%]

scenario:Benchmarks.Trace.Log4netBenchmark.EnrichedLog netcoreapp3.1

ignore allocated_mem [-1 bytes; +0 bytes] or [-0.027%; -0.017%]
🟥 execution_time [+299.469ms; +301.430ms] or [+149.921%; +150.903%]
ignore throughput [-137.320op/s; -71.386op/s] or [-1.749%; -0.909%]

scenario:Benchmarks.Trace.RedisBenchmark.SendReceive net472

ignore allocated_mem [+0 bytes; +0 bytes] or [-0.005%; +0.005%]
ignore execution_time [+30.339µs; +846.564µs] or [+0.015%; +0.422%]
ignore throughput [-10348.337op/s; -7972.376op/s] or [-2.865%; -2.207%]

scenario:Benchmarks.Trace.RedisBenchmark.SendReceive net6.0

ignore allocated_mem [+0 bytes; +0 bytes] or [-0.004%; +0.007%]
ignore execution_time [-80.290µs; +880.926µs] or [-0.040%; +0.440%]
🟩 throughput [+48758.849op/s; +53160.179op/s] or [+9.229%; +10.062%]

scenario:Benchmarks.Trace.RedisBenchmark.SendReceive netcoreapp3.1

ignore allocated_mem [+0 bytes; +0 bytes] or [-0.004%; +0.006%]
ignore execution_time [+1.283ms; +5.024ms] or [+0.650%; +2.547%]
ignore throughput [-15.401op/s; +8514.217op/s] or [-0.004%; +2.015%]

scenario:Benchmarks.Trace.SerilogBenchmark.EnrichedLog net472

ignore allocated_mem [+0 bytes; +0 bytes] or [-0.005%; +0.006%]
🟥 execution_time [+300.482ms; +302.334ms] or [+149.763%; +150.686%]
ignore throughput [-7027.700op/s; -5302.927op/s] or [-4.641%; -3.502%]

scenario:Benchmarks.Trace.SerilogBenchmark.EnrichedLog net6.0

ignore allocated_mem [+0 bytes; +0 bytes] or [+0.000%; +0.009%]
🟥 execution_time [+301.258ms; +302.379ms] or [+151.277%; +151.840%]
ignore throughput [+1368.038op/s; +3272.318op/s] or [+0.595%; +1.423%]

scenario:Benchmarks.Trace.SerilogBenchmark.EnrichedLog netcoreapp3.1

ignore allocated_mem [+0 bytes; +0 bytes] or [-0.004%; +0.003%]
🟥 execution_time [+303.876ms; +306.234ms] or [+154.106%; +155.302%]
ignore throughput [-2732.470op/s; -717.313op/s] or [-1.539%; -0.404%]

scenario:Benchmarks.Trace.SingleSpanAspNetCoreBenchmark.SingleSpanAspNetCore net472

ignore allocated_mem [+0 bytes; +0 bytes] or [+nan%; +nan%]
🟥 execution_time [+300.002ms; +300.589ms] or [+149.643%; +149.935%]
🟩 throughput [+61153796.666op/s; +61406138.084op/s] or [+44.536%; +44.720%]

scenario:Benchmarks.Trace.SingleSpanAspNetCoreBenchmark.SingleSpanAspNetCore net6.0

ignore allocated_mem [+74 bytes; +76 bytes] or [+0.435%; +0.445%]
unstable execution_time [+316.375ms; +368.033ms] or [+393.468%; +457.715%]
🟩 throughput [+946.951op/s; +1136.706op/s] or [+7.320%; +8.787%]

scenario:Benchmarks.Trace.SingleSpanAspNetCoreBenchmark.SingleSpanAspNetCore netcoreapp3.1

ignore allocated_mem [+0 bytes; +0 bytes] or [+nan%; +nan%]
🟥 execution_time [+299.036ms; +299.936ms] or [+149.152%; +149.601%]
ignore throughput [+1641777.169op/s; +2578500.690op/s] or [+0.727%; +1.142%]

scenario:Benchmarks.Trace.SpanBenchmark.StartFinishScope net472

ignore allocated_mem [+0 bytes; +0 bytes] or [-0.005%; +0.006%]
ignore execution_time [+332.036µs; +788.519µs] or [+0.166%; +0.394%]
ignore throughput [-23678.449op/s; -19929.429op/s] or [-2.643%; -2.224%]

scenario:Benchmarks.Trace.SpanBenchmark.StartFinishScope net6.0

ignore allocated_mem [+0 bytes; +0 bytes] or [-0.005%; +0.007%]
ignore execution_time [-5.131ms; -3.870ms] or [-2.513%; -1.896%]
🟩 throughput [+108776.903op/s; +115759.176op/s] or [+10.156%; +10.808%]

scenario:Benchmarks.Trace.SpanBenchmark.StartFinishScope netcoreapp3.1

ignore allocated_mem [+0 bytes; +0 bytes] or [-0.003%; +0.006%]
ignore execution_time [+1.148ms; +5.256ms] or [+0.581%; +2.660%]
🟩 throughput [+49519.334op/s; +68802.983op/s] or [+5.732%; +7.964%]

scenario:Benchmarks.Trace.SpanBenchmark.StartFinishSpan net472

ignore allocated_mem [+0 bytes; +0 bytes] or [-0.000%; +0.009%]
ignore execution_time [+416.301µs; +854.812µs] or [+0.208%; +0.427%]
ignore throughput [+2588.697op/s; +5488.624op/s] or [+0.237%; +0.502%]

scenario:Benchmarks.Trace.SpanBenchmark.StartFinishSpan net6.0

ignore allocated_mem [+0 bytes; +0 bytes] or [-0.004%; +0.004%]
ignore execution_time [+6.198ms; +10.308ms] or [+3.229%; +5.370%]
🟩 throughput [+80368.250op/s; +111272.371op/s] or [+6.221%; +8.613%]

scenario:Benchmarks.Trace.SpanBenchmark.StartFinishSpan netcoreapp3.1

ignore allocated_mem [+0 bytes; +0 bytes] or [-0.001%; +0.008%]
ignore execution_time [-4.027ms; -2.607ms] or [-1.978%; -1.281%]
🟩 throughput [+88595.225op/s; +95969.263op/s] or [+8.799%; +9.531%]

scenario:Benchmarks.Trace.SpanBenchmark.StartFinishTwoScopes net472

ignore allocated_mem [+0 bytes; +0 bytes] or [-0.008%; +0.002%]
ignore execution_time [-1193.903µs; +75.492µs] or [-0.594%; +0.038%]
ignore throughput [+815.463op/s; +4848.318op/s] or [+0.182%; +1.080%]

scenario:Benchmarks.Trace.SpanBenchmark.StartFinishTwoScopes net6.0

ignore allocated_mem [+0 bytes; +0 bytes] or [-0.004%; +0.009%]
ignore execution_time [-923.714µs; +567.138µs] or [-0.461%; +0.283%]
🟩 throughput [+46433.572op/s; +51322.560op/s] or [+8.431%; +9.319%]

scenario:Benchmarks.Trace.SpanBenchmark.StartFinishTwoScopes netcoreapp3.1

ignore allocated_mem [+0 bytes; +0 bytes] or [-0.006%; +0.004%]
ignore execution_time [+0.381ms; +4.470ms] or [+0.192%; +2.246%]
🟩 throughput [+24431.933op/s; +34068.004op/s] or [+5.469%; +7.626%]

scenario:Benchmarks.Trace.TraceAnnotationsBenchmark.RunOnMethodBegin net472

ignore allocated_mem [+0 bytes; +0 bytes] or [-0.005%; +0.006%]
ignore execution_time [-1.556ms; -0.587ms] or [-0.775%; -0.293%]
🟥 throughput [-38009.288op/s; -34290.445op/s] or [-5.563%; -5.019%]

scenario:Benchmarks.Trace.TraceAnnotationsBenchmark.RunOnMethodBegin net6.0

ignore allocated_mem [+0 bytes; +0 bytes] or [-0.005%; +0.007%]
ignore execution_time [-1277.538µs; +2218.228µs] or [-0.639%; +1.110%]
🟩 throughput [+45094.205op/s; +62863.919op/s] or [+5.038%; +7.023%]

scenario:Benchmarks.Trace.TraceAnnotationsBenchmark.RunOnMethodBegin netcoreapp3.1

ignore allocated_mem [+0 bytes; +0 bytes] or [-0.005%; +0.005%]
ignore execution_time [+1.238ms; +5.129ms] or [+0.629%; +2.605%]
ignore throughput [+22240.997op/s; +37093.674op/s] or [+3.105%; +5.179%]

DSM overhead optimizations

3850cdb

github-actions bot added the area:data-streams-monitoring label Apr 13, 2026

Implement optimizations for all integrations

adeec8f

kr-igor added 2 commits April 14, 2026 12:56

[DSM] Reduce per-message overhead in Kafka produce and consume hot paths

13b47c9

Avoid context switching

ec5f36b

kr-igor force-pushed the kr-igor/dsm-overhead-optimizations branch from b0dfa05 to ec5f36b Compare April 15, 2026 19:38

kr-igor and others added 3 commits April 15, 2026 14:55

Adjusted DrainThreshold and DrainTimeoutMs

110c978

Merge branch 'master' into kr-igor/dsm-overhead-optimizations

8949f8a

Cache backlog tags to avoid allocations, simplified dictionary lookups

bb42028

Conversation

kr-igor commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

DSM Per-Message Overhead Optimizations

Summary of changes

Reason for change

Implementation details

Caching strategy

Node-hash caching

Zero-allocation encode/decode

Drain signal

Test coverage

Other details

Uh oh!

dd-trace-dotnet-ci-bot bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Execution-Time Benchmarks Report ⏱️

FakeDbCommand

HttpMessageHandler

Uh oh!

pr-commenter bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

Explanation

More details about the CI and significant changes

Known flaky benchmarks

scenario:Benchmarks.Trace.ActivityBenchmark.StartStopWithChild net472

scenario:Benchmarks.Trace.ActivityBenchmark.StartStopWithChild net6.0

scenario:Benchmarks.Trace.ActivityBenchmark.StartStopWithChild netcoreapp3.1

scenario:Benchmarks.Trace.AgentWriterBenchmark.WriteAndFlushEnrichedTraces net472

scenario:Benchmarks.Trace.AgentWriterBenchmark.WriteAndFlushEnrichedTraces net6.0

scenario:Benchmarks.Trace.AgentWriterBenchmark.WriteAndFlushEnrichedTraces netcoreapp3.1

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleMoreComplexBody net472

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleMoreComplexBody net6.0

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleMoreComplexBody netcoreapp3.1

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleSimpleBody net472

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleSimpleBody net6.0

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleSimpleBody netcoreapp3.1

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.ObjectExtractorMoreComplexBody net472

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.ObjectExtractorMoreComplexBody net6.0

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.ObjectExtractorMoreComplexBody netcoreapp3.1

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.ObjectExtractorSimpleBody net472

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.ObjectExtractorSimpleBody net6.0

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.ObjectExtractorSimpleBody netcoreapp3.1

scenario:Benchmarks.Trace.Asm.AppSecEncoderBenchmark.EncodeArgs net472

scenario:Benchmarks.Trace.Asm.AppSecEncoderBenchmark.EncodeArgs net6.0

scenario:Benchmarks.Trace.Asm.AppSecEncoderBenchmark.EncodeArgs netcoreapp3.1

scenario:Benchmarks.Trace.Asm.AppSecEncoderBenchmark.EncodeLegacyArgs net472

scenario:Benchmarks.Trace.Asm.AppSecEncoderBenchmark.EncodeLegacyArgs net6.0

scenario:Benchmarks.Trace.Asm.AppSecEncoderBenchmark.EncodeLegacyArgs netcoreapp3.1

scenario:Benchmarks.Trace.Asm.AppSecWafBenchmark.RunWafRealisticBenchmark net472

scenario:Benchmarks.Trace.Asm.AppSecWafBenchmark.RunWafRealisticBenchmark net6.0

scenario:Benchmarks.Trace.Asm.AppSecWafBenchmark.RunWafRealisticBenchmark netcoreapp3.1

scenario:Benchmarks.Trace.Asm.AppSecWafBenchmark.RunWafRealisticBenchmarkWithAttack net472

scenario:Benchmarks.Trace.Asm.AppSecWafBenchmark.RunWafRealisticBenchmarkWithAttack net6.0

scenario:Benchmarks.Trace.Asm.AppSecWafBenchmark.RunWafRealisticBenchmarkWithAttack netcoreapp3.1

scenario:Benchmarks.Trace.AspNetCoreBenchmark.SendRequest net472

scenario:Benchmarks.Trace.AspNetCoreBenchmark.SendRequest net6.0

scenario:Benchmarks.Trace.AspNetCoreBenchmark.SendRequest netcoreapp3.1

scenario:Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark.WriteAndFlushEnrichedTraces net472

scenario:Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark.WriteAndFlushEnrichedTraces net6.0

scenario:Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark.WriteAndFlushEnrichedTraces netcoreapp3.1

scenario:Benchmarks.Trace.CharSliceBenchmark.OptimizedCharSlice net472

scenario:Benchmarks.Trace.CharSliceBenchmark.OptimizedCharSlice net6.0

scenario:Benchmarks.Trace.CharSliceBenchmark.OptimizedCharSlice netcoreapp3.1

scenario:Benchmarks.Trace.CharSliceBenchmark.OptimizedCharSliceWithPool net472

scenario:Benchmarks.Trace.CharSliceBenchmark.OptimizedCharSliceWithPool net6.0

scenario:Benchmarks.Trace.CharSliceBenchmark.OptimizedCharSliceWithPool netcoreapp3.1

scenario:Benchmarks.Trace.CharSliceBenchmark.OriginalCharSlice net472

scenario:Benchmarks.Trace.CharSliceBenchmark.OriginalCharSlice net6.0

scenario:Benchmarks.Trace.CharSliceBenchmark.OriginalCharSlice netcoreapp3.1

scenario:Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearch net472

scenario:Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearch net6.0

scenario:Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearch netcoreapp3.1

scenario:Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearchAsync net472

scenario:Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearchAsync net6.0

scenario:Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearchAsync netcoreapp3.1

scenario:Benchmarks.Trace.GraphQLBenchmark.ExecuteAsync net472

scenario:Benchmarks.Trace.GraphQLBenchmark.ExecuteAsync net6.0

scenario:Benchmarks.Trace.GraphQLBenchmark.ExecuteAsync netcoreapp3.1

kr-igor commented Apr 13, 2026 •

edited

Loading

dd-trace-dotnet-ci-bot bot commented Apr 13, 2026 •

edited

Loading

pr-commenter bot commented Apr 14, 2026 •

edited

Loading