Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e52cc41a00
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| { | ||
| if (span.Context.ParentId is null or 0) | ||
| { | ||
| return !filter.ShouldKeepTrace(span); |
There was a problem hiding this comment.
Apply trace filters when chunks lack a parentless root span
Trace filtering only runs when a span has ParentId null/0, so chunks where every span has a non-zero parent (common in distributed traces with propagated context, or some partial chunks) bypass filter_tags/ignore_resources entirely. In those cases IsTraceFiltered returns false and filtered traces are kept and counted in stats, which defeats agent-configured filtering for a real production traffic pattern.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Interesting. We should probably do whatever the Go agent is doing.
There was a problem hiding this comment.
So. This is where things get very interesting 😅 Will discuss this one offline I think 😄
There was a problem hiding this comment.
Mostly fixed this in #8436, though not strictly the same as the Go agent does. Plus this is going to bork with partial flushing (both sides)
BenchmarksBenchmark execution time: 2026-04-16 09:05:29 Comparing candidate commit e9ee48d in PR branch Found 0 performance improvements and 1 performance regressions! Performance is the same for 26 metrics, 0 unstable metrics, 87 known flaky benchmarks.
|
Execution-Time Benchmarks Report ⏱️Execution-time results for samples comparing This PR (8420) and master. ✅ No regressions detected - check the details below Full Metrics ComparisonFakeDbCommand
HttpMessageHandler
Comparison explanationExecution-time benchmarks measure the whole time it takes to execute a program, and are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are highlighted in **red**. The following thresholds were used for comparing the execution times:
Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard. Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph). Duration chartsFakeDbCommand (.NET Framework 4.8)gantt
title Execution time (ms) FakeDbCommand (.NET Framework 4.8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8420) - mean (72ms) : 70, 74
master - mean (72ms) : 70, 74
section Bailout
This PR (8420) - mean (75ms) : 74, 77
master - mean (76ms) : 74, 78
section CallTarget+Inlining+NGEN
This PR (8420) - mean (1,065ms) : 1024, 1107
master - mean (1,076ms) : 1010, 1141
FakeDbCommand (.NET Core 3.1)gantt
title Execution time (ms) FakeDbCommand (.NET Core 3.1)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8420) - mean (112ms) : 109, 115
master - mean (114ms) : 110, 117
section Bailout
This PR (8420) - mean (114ms) : 111, 116
master - mean (115ms) : 112, 118
section CallTarget+Inlining+NGEN
This PR (8420) - mean (786ms) : 763, 809
master - mean (784ms) : 764, 805
FakeDbCommand (.NET 6)gantt
title Execution time (ms) FakeDbCommand (.NET 6)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8420) - mean (100ms) : 96, 103
master - mean (100ms) : 96, 103
section Bailout
This PR (8420) - mean (100ms) : 98, 102
master - mean (101ms) : 98, 104
section CallTarget+Inlining+NGEN
This PR (8420) - mean (931ms) : 893, 969
master - mean (932ms) : 899, 964
FakeDbCommand (.NET 8)gantt
title Execution time (ms) FakeDbCommand (.NET 8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8420) - mean (98ms) : 95, 100
master - mean (98ms) : 94, 103
section Bailout
This PR (8420) - mean (99ms) : 97, 101
master - mean (99ms) : 97, 101
section CallTarget+Inlining+NGEN
This PR (8420) - mean (813ms) : 779, 847
master - mean (814ms) : 781, 848
HttpMessageHandler (.NET Framework 4.8)gantt
title Execution time (ms) HttpMessageHandler (.NET Framework 4.8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8420) - mean (191ms) : 187, 195
master - mean (191ms) : 187, 194
section Bailout
This PR (8420) - mean (194ms) : 192, 196
master - mean (194ms) : 192, 196
section CallTarget+Inlining+NGEN
This PR (8420) - mean (1,144ms) : 1096, 1192
master - mean (1,142ms) : 1100, 1184
HttpMessageHandler (.NET Core 3.1)gantt
title Execution time (ms) HttpMessageHandler (.NET Core 3.1)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8420) - mean (273ms) : 268, 277
master - mean (273ms) : 269, 277
section Bailout
This PR (8420) - mean (273ms) : 271, 276
master - mean (273ms) : 270, 276
section CallTarget+Inlining+NGEN
This PR (8420) - mean (926ms) : 906, 946
master - mean (924ms) : 895, 953
HttpMessageHandler (.NET 6)gantt
title Execution time (ms) HttpMessageHandler (.NET 6)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8420) - mean (268ms) : 262, 275
master - mean (267ms) : 262, 272
section Bailout
This PR (8420) - mean (272ms) : 267, 277
master - mean (266ms) : 264, 268
section CallTarget+Inlining+NGEN
This PR (8420) - mean (1,146ms) : 1115, 1177
master - mean (1,133ms) : 1090, 1176
HttpMessageHandler (.NET 8)gantt
title Execution time (ms) HttpMessageHandler (.NET 8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8420) - mean (281ms) : 274, 289
master - mean (264ms) : 261, 267
section Bailout
This PR (8420) - mean (280ms) : crit, 273, 288
master - mean (265ms) : 261, 268
section CallTarget+Inlining+NGEN
This PR (8420) - mean (1,045ms) : 996, 1094
master - mean (1,019ms) : 980, 1058
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| spanMetaStructs: spanMetaStructs, | ||
| spanEvents: spanEvents); | ||
| spanEvents: spanEvents, | ||
| peerTags: peerTags!, |
There was a problem hiding this comment.
peerTags is nullable so we shouldn't need the ! nullability suppression here.
There was a problem hiding this comment.
Yeah, except the ! is suppressing the inner non-nullability, because the compiler isn't clever enough to know that it's a List<string> not List<string?>, even though we filter string.IsNullOrEmpty (note that StringUtil also doesn't help here)
var peerTags = (jObject["peer_tags"] as JArray)?.Values<string>().Where(x => !string.IsNullOrEmpty(x)).Distinct().OrderBy(x => x).ToList();Removing the dammit gives errors 🙁
| { | ||
| if (!string.IsNullOrEmpty(pattern)) | ||
| { | ||
| compiled.Add(new Regex(pattern, RegexOptions.Compiled, matchTimeout: TimeSpan.FromSeconds(1))); |
There was a problem hiding this comment.
Remember these aren't actually compiled until the first time they are used. Not sure if it matters here, but somewhere else we do a dummy Match() to force the compilation early.
There was a problem hiding this comment.
Yeah, it's a good point, hadn't thought about that. I think this is likely actually the correct behaviour, as it means that compilation is happening at serialization time instead of blocking the discovery service updates, but I could see an argument both ways 🤔 Any preference?
| } | ||
|
|
||
| // Find the root span (ParentId == null or 0) and apply the filter | ||
| foreach (var span in spans) |
There was a problem hiding this comment.
If this span collection follows our usual ordering, the root span will usually be the last span, so it might be faster to search in reverse (from last span to first span).
There was a problem hiding this comment.
Heh, that's actually harder than it seems, because it's a SpanCollection which doesn't expose the root array, but it's a good point. I'll address that when looking into the partial flush issue...
There was a problem hiding this comment.
Pull request overview
Updates the tracer’s client-side stats pipeline to better match the “client-side-stats” 1.2.0 wire format/behavior by expanding aggregation dimensions, adjusting serialization semantics, and incorporating agent-driven discovery/filtering signals.
Changes:
- Extend stats aggregation key + msgpack payload to include additional dimensions (span kind, trace-root flag, HTTP method/endpoint, gRPC status, service source, peer tags) and add stochastic rounding/weighted accumulation.
- Add agent discovery support for peer tags, span kinds eligibility, obfuscation negotiation, and trace-level filtering (exact/regex/resource-based).
- Update SQL obfuscation tokenization/normalization and adjust stats sending API surface (obfuscation header, no-retry for agent stats).
Reviewed changes
Copilot reviewed 32 out of 32 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| tracer/src/Datadog.Trace/Agent/StatsAggregator.cs | Computes stats with new dimensions (peer tags, span kinds), filtering, weighting, and obfuscation negotiation. |
| tracer/src/Datadog.Trace/Agent/StatsBuffer.cs | Serializes updated msgpack wire format and suppresses empty buckets. |
| tracer/src/Datadog.Trace/Agent/StatsBucket.cs | Switch counters to double for weighted accumulation; store peer tags. |
| tracer/src/Datadog.Trace/Agent/StatsAggregationKey.cs | Adds new dimensions to key equality/hash. |
| tracer/src/Datadog.Trace/Agent/DiscoveryService/DiscoveryService.cs | Parses /info additions (peer_tags, span_kinds_stats_computed, obfuscation_version, trace filters). |
| tracer/src/Datadog.Trace/Agent/DiscoveryService/AgentConfiguration.cs | Carries newly discovered config fields through the system. |
| tracer/src/Datadog.Trace/Agent/TraceSamplers/TraceFilter.cs | Implements agent-driven trace-level filtering on root spans. |
| tracer/src/Datadog.Trace/Processors/ObfuscatorTraceProcessor.cs | Expands SQL token splitters and normalizes spaces around operators near placeholders. |
| tracer/src/Datadog.Trace/Agent/Api.cs | Sends stats with obfuscation header and disables retries for agent stats. |
| tracer/src/Datadog.Trace/Agent/ApiOtlp.cs | Updates stats API signature/state for OTLP path. |
| tracer/src/Datadog.Trace/Agent/TraceSamplers/RareSampler.cs | Uses stats aggregator to build keys (signature change). |
| tracer/test/** | Updates/adds tests for new dimensions, filtering, obfuscation behavior, and API signature changes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| var key = BuildKey(span, out var peerTags); | ||
|
|
||
| var buffer = CurrentBuffer; | ||
|
|
||
| if (!buffer.Buckets.TryGetValue(key, out var bucket)) | ||
| { |
There was a problem hiding this comment.
AddToBuffer() calls BuildKey(span, out var peerTags) for every span before TryGetValue. For client/producer/consumer spans this allocates UTF-8 bytes (and intermediate strings via $"{tagKey}:{tagValue}") even when the bucket already exists and the computed peerTags list is discarded. Consider restructuring so peer-tag materialization only happens when inserting a new bucket (e.g., compute hash first, or defer building utf8PeerTags until after TryGetValue indicates the key is new) to avoid per-span allocation overhead.
| var key = BuildKey(span, out var peerTags); | |
| var buffer = CurrentBuffer; | |
| if (!buffer.Buckets.TryGetValue(key, out var bucket)) | |
| { | |
| var buffer = CurrentBuffer; | |
| var key = BuildKey(span, out _); | |
| if (!buffer.Buckets.TryGetValue(key, out var bucket)) | |
| { | |
| key = BuildKey(span, out var peerTags); |
There was a problem hiding this comment.
Yeah, this one is a pain, I don't have a great answer to it currently 🙁 Will address in a follow up, when I work out how 😅
There was a problem hiding this comment.
To Clarify, the suggested approach is much worse, as it allocates twice 😉
e52cc41 to
562d9e7
Compare
f00d1b6 to
77f3510
Compare
562d9e7 to
0b3eff8
Compare
77f3510 to
536d246
Compare
The ObfuscatorTraceProcessor was running unconditionally, causing double-obfuscation when the agent also obfuscates (version 0 or version > tracer). Now it only runs when the tracer has negotiated obfuscation responsibility (agent version > 0 and <= tracer version). Re-obfuscating an already-obfuscated query produces incorrect results (e.g. SELECT ? FROM users would have ? treated as a literal on the second pass). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The rebase made tracerObfuscationVersion a required parameter on IApi.SendStatsAsync. Update test call sites to pass the argument. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Lucas Pimentel <lucas.pimentel@datadoghq.com>
…ritten as "" if it's not there)
5573211 to
e9ee48d
Compare
Summary of changes
Updates the existing client-side-stats implementation to match version 1.2.0 as defined in the RFC and as implemented in the agent.
Reason for change
Our implementation is severely lagging the latest implementation in the agent. This hasn't been a big deal, as it's not documented and not enabled by default, but we'd like to fix the implementation to make it usable.
Implementation details
This was driven almost entirely by 🤖, by comparing our existing implementation to the RFC, and also taking the agent/go implementation as the canonical implementation.
At a high level, the PR contains the following changes:
SpanKind,IsTraceRoot(as Trilean),HTTPMethod,HTTPEndpoint,GRPCStatusCode,ServiceSource,PeerTagsto both the aggregation key and the msgpack wire formatGRPCStatusCodetype: Changed frominttostringto match Go agent's wire format (agent was returning 400 Bad Request in system tests)rpc.grpc.status_code,grpc.code,rpc.grpc.status.code,grpc.status.code)Hits,Errors,Duration,TopLevelHitsaccumulated asdouble(weighted by sampling rate) then rounded probabilistically toint64for serialization1/rate), matching Go agent behavior"unknown-env"when environment is not configuredgit_commit_sha: Added as optional field in stats payloadService: Added as top-level field in stats payloadHasHits()check prevents sending payloads with zero-hit buckets (stale keys retained for sketch reuse)Starttimestamps aligned to 10-second boundaries (ts - ts % 10_000_000_000) matching Go tracer'salignTsStartTimeproperty (onlyStartas aligned nanoseconds)/infoEndpoint)peer_tags: Parsed, sorted, deduplicated; used for peer tag extraction on client/producer/consumer spansspan_kinds_stats_computed: Parsed to override eligible span kindsobfuscation_version: Parsed for obfuscation negotiationfilter_tags,filter_tags_regex,ignore_resourcesfrom/infoTraceFilterimplementation: Evaluates agent-configured filters (exact tags, regex tags, resource patterns) against root spans before stats computation* / = < > ! & ^ % ~ ? @ : #as token splitters (matching Go agent'sgo-sqllexerisOperator()) so queries likeWHERE id='1'are properly obfuscated=,<,>,!) adjacent to?placeholders, matching Go agent's normalizer output (e.g.,id='1'→id = ?)obfuscation_versionis negotiated with agent; sendsDatadog-Obfuscation-VersionheaderIpAddressObfuscationUtil.QuantizePeerIpAddresses()replacing non-allowed IPs with"blocked-ip-address"_dd.base_service:{serviceName}as sole peer tagStartsWith("synthetics")prefix matching (not exact match)obfuscation_versionin/inforesponseTest coverage
There's a lot going on in this PR, because we were so far behind. I could split this into implementing individual features, but there would be a lot of duplication between PRs, and it didn't seem like it would be that easy to track. At least with this big bang we can compare directly against the system tests etc.
The existing system tests for stats computation were checked, and made to pass (which identified a number of hidden expectations which will be added to the RFC). I'll create a PR to enable these in the system-tests repo
Other details
Part of a stack
charhelpers likeIsAsciiHexDigit#8417IpAddressObfuscationUtilfor use with client-side-stats #8418There are still some theoretical gaps between the go implementation and the .NET implementation, but I think these are non-issues in most cases:
"CANCELLED"→"1"); .NET passes raw tag valuehttp.status_codeandhttp.response.status_code(OTel convention); .NET only checkshttp.status_code0in .NETHTTP_method/HTTP_endpoint_top_levelmetric_top_leveland_dd.top_level; .NET only checks_dd.top_levelspan_derived_primary_tagsThere is another big elephant in the room, which is perf. The peer tags, in particular, currently requires a bunch of allocation. I'd rather defer trying to fight against that to another PR if possible, unless anyone has some clear ideas 😄
Another aspect I'm not sure about is how this interacts with @zacharycmontoya's recent work to publish OTLP stats. I took a random guess and fought the refactoring, but need to verify it.