Improve peer-tag calculation fast-path performance#8445
Improve peer-tag calculation fast-path performance#8445andrewlock wants to merge 8 commits intoandrew/client-side-stats/fnvhashfrom
Conversation
Execution-Time Benchmarks Report ⏱️Execution-time results for samples comparing This PR (8445) and master. ✅ No regressions detected - check the details below Full Metrics ComparisonFakeDbCommand
HttpMessageHandler
Comparison explanationExecution-time benchmarks measure the whole time it takes to execute a program, and are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are highlighted in **red**. The following thresholds were used for comparing the execution times:
Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard. Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph). Duration chartsFakeDbCommand (.NET Framework 4.8)gantt
title Execution time (ms) FakeDbCommand (.NET Framework 4.8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8445) - mean (71ms) : 69, 74
master - mean (72ms) : 70, 74
section Bailout
This PR (8445) - mean (76ms) : 75, 78
master - mean (76ms) : 74, 78
section CallTarget+Inlining+NGEN
This PR (8445) - mean (1,067ms) : 1029, 1105
master - mean (1,076ms) : 1010, 1141
FakeDbCommand (.NET Core 3.1)gantt
title Execution time (ms) FakeDbCommand (.NET Core 3.1)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8445) - mean (112ms) : 108, 116
master - mean (114ms) : 110, 117
section Bailout
This PR (8445) - mean (112ms) : 110, 115
master - mean (115ms) : 112, 118
section CallTarget+Inlining+NGEN
This PR (8445) - mean (782ms) : 761, 803
master - mean (784ms) : 764, 805
FakeDbCommand (.NET 6)gantt
title Execution time (ms) FakeDbCommand (.NET 6)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8445) - mean (99ms) : 95, 102
master - mean (100ms) : 96, 103
section Bailout
This PR (8445) - mean (101ms) : 98, 103
master - mean (101ms) : 98, 104
section CallTarget+Inlining+NGEN
This PR (8445) - mean (931ms) : 896, 967
master - mean (932ms) : 899, 964
FakeDbCommand (.NET 8)gantt
title Execution time (ms) FakeDbCommand (.NET 8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8445) - mean (98ms) : 95, 101
master - mean (98ms) : 94, 103
section Bailout
This PR (8445) - mean (99ms) : 97, 101
master - mean (99ms) : 97, 101
section CallTarget+Inlining+NGEN
This PR (8445) - mean (814ms) : 777, 851
master - mean (814ms) : 781, 848
HttpMessageHandler (.NET Framework 4.8)gantt
title Execution time (ms) HttpMessageHandler (.NET Framework 4.8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8445) - mean (192ms) : 187, 198
master - mean (191ms) : 187, 194
section Bailout
This PR (8445) - mean (196ms) : 192, 200
master - mean (194ms) : 192, 196
section CallTarget+Inlining+NGEN
This PR (8445) - mean (1,141ms) : 1092, 1189
master - mean (1,142ms) : 1100, 1184
HttpMessageHandler (.NET Core 3.1)gantt
title Execution time (ms) HttpMessageHandler (.NET Core 3.1)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8445) - mean (273ms) : 269, 276
master - mean (273ms) : 269, 277
section Bailout
This PR (8445) - mean (274ms) : 270, 278
master - mean (273ms) : 270, 276
section CallTarget+Inlining+NGEN
This PR (8445) - mean (924ms) : 901, 947
master - mean (924ms) : 895, 953
HttpMessageHandler (.NET 6)gantt
title Execution time (ms) HttpMessageHandler (.NET 6)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8445) - mean (267ms) : 264, 270
master - mean (267ms) : 262, 272
section Bailout
This PR (8445) - mean (267ms) : 264, 270
master - mean (266ms) : 264, 268
section CallTarget+Inlining+NGEN
This PR (8445) - mean (1,137ms) : 1096, 1178
master - mean (1,133ms) : 1090, 1176
HttpMessageHandler (.NET 8)gantt
title Execution time (ms) HttpMessageHandler (.NET 8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8445) - mean (265ms) : 261, 269
master - mean (264ms) : 261, 267
section Bailout
This PR (8445) - mean (265ms) : 263, 268
master - mean (265ms) : 261, 268
section CallTarget+Inlining+NGEN
This PR (8445) - mean (1,016ms) : 981, 1052
master - mean (1,019ms) : 980, 1058
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
BenchmarksBenchmark execution time: 2026-04-16 09:06:29 Comparing candidate commit 7817c48 in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 27 metrics, 0 unstable metrics, 87 known flaky benchmarks.
|
| if (config.PeerTags is not null) | ||
| if (CanComputeStats.Value) | ||
| { | ||
| Log.Debug("Stats computation has been enabled."); |
There was a problem hiding this comment.
| Log.Debug("Stats computation has been enabled."); | |
| Log.Debug("Stats computation enabled."); |
There was a problem hiding this comment.
😂 That's the same log as was there before, I just moved it, but sure
| else | ||
| { | ||
| Interlocked.Exchange(ref _peerTagKeys, config.PeerTags); | ||
| Log.Warning("Stats computation disabled because the detected agent does not support this feature."); |
There was a problem hiding this comment.
We're almost certainly going to need an override for this when using the Rust agents, but that's a separate issue...
There was a problem hiding this comment.
Yeah, that whole question of how we identify the rust agent is a whole open question...
| var spanMetaStructs = jObject["span_meta_structs"]?.Value<bool>() ?? false; | ||
| var spanEvents = jObject["span_events"]?.Value<bool>() ?? false; | ||
| var peerTags = (jObject["peer_tags"] as JArray)?.Values<string>().Where(x => !string.IsNullOrEmpty(x)).Distinct().OrderBy(x => x).ToList(); | ||
| var peerTags = (jObject["peer_tags"] as JArray)?.Values<string>().ToList(); |
58d4473 to
9daade0
Compare
e94c3c9 to
9b41576
Compare
9daade0 to
7817c48
Compare
9b41576 to
25dabec
Compare
Summary of changes
Improves the performance of peer-tag hash calculation for "fast path" cases (where there's an existing bucket)
Reason for change
We have to calculate the hash of all peer tags as part of client-side-stats bucketing calculations, which involves encoding them as utf-8. Additionally, when we send the buckets, we have to send the tags as utf-8.
In the initial CSS 1.2.0 implementation, we encoded the tags every-time we ran a calculation, which would allocate for every span that had peer tags, generally quite expensive. In this PR, we switch to doing the encoding twice: once with a zero-allocation implementation (amortized 0 on .NET Framework) to calculate the hash, and then, if we need the "real" encoded tags, then we do that encoding again.
Implementation details
key:valuestackallocfor .NET Core, and array pool implementation for .NET Framework etcBuildKeytoGetEncodedPeerTags(i.e. is this a "base service" only tag, if so, what's the tag value, otherwise how big does the peer tag list need to be)There are some possible future optimizations, not implemented in this PR:
tag:value, allow writing the pre-computedbyte[]toMessagePackBinaryand appending thevalue. This is doable, but requires updating theMessagePackBinaryimplementation to support it, so I considered it out of scope for nowTest coverage
Mostly covered by existing tests, but added some additional unit tests that directly compare the hashing to values used in Go agent tests.
Additionally did some benchmarking. The key thing is that the
ClientSpanWithPeerTagspath is zero-allocation (and it's nice that the slow-path is still lower allocation than before, even if it's slower over all)Details
Other details
Part of a stack:
charhelpers likeIsAsciiHexDigit#8417IpAddressObfuscationUtilfor use with client-side-stats #8418ContainsSpanIdtoSpanCollection#8435FnvHash64.GenerateHash()#8444