Update client-side stats to use light weight Hashtable#11382
Conversation
Two general-purpose utilities used by the client-side stats aggregator work (PR #11382 and follow-ups), extracted into their own change so the metrics-specific PRs can build on a smaller, reviewable foundation. - Hashtable: a generic open-addressed-ish bucket table abstraction keyed by a 64-bit hash, with a public abstract Entry type so client code can subclass it for higher-arity keys. The metrics aggregator uses it to back its AggregateTable. - LongHashingUtils: chained 64-bit hash combiners with primitive overloads (boolean, short, int, long, Object). Used in place of varargs combiners to avoid Object[] allocation and boxing on the hot path. No callers within internal-api itself yet -- the metrics aggregator PR will introduce the first usages. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Standalone classes for swapping the consumer-side LRUCache<MetricKey, AggregateMetric> with a multi-key Hashtable in the next commit. No call sites use them yet. - AggregateEntry extends Hashtable.Entry, holds the canonical MetricKey, the mutable AggregateMetric, and copies of the 13 raw SpanSnapshot fields for matches(). The 64-bit lookup hash is computed via chained LongHashingUtils.addToHash calls (no varargs, no boxing of short/boolean). - AggregateTable wraps a Hashtable.Entry[] from Hashtable.Support.create. findOrInsert(SpanSnapshot) walks the bucket comparing raw fields, falling back to MetricKeys.fromSnapshot on a true miss. On cap overrun, it scans for an entry with hitCount==0 and unlinks it; if none, it returns null and the caller drops the data point. - MetricKeys.fromSnapshot extracts the canonicalization logic (DDCache lookups + UTF8 encoding) from Aggregator.buildMetricKey, so the helper can be called from AggregateTable on miss. This also commits Hashtable and LongHashingUtils (added earlier, previously uncommitted) and lifts Hashtable.Entry / Hashtable.Support visibility so client code outside datadog.trace.util can build higher-arity tables -- the case the javadoc describes but the original visibility didn't actually support. Specifically: Entry is now public abstract with a protected ctor; keyHash, next(), and setNext() are public; Support's create / clear / bucketIndex / bucketIterator / mutatingBucketIterator methods are public. Tests: AggregateTableTest covers hit, miss, distinct-by-spanKind, peer-tag identity (including null vs non-null), cap overrun with stale victim, cap overrun with no victim (returns null), expungeStaleAggregates, forEach, clear, and that the canonical MetricKey is built at insert. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace LRUCache<MetricKey, AggregateMetric> with the AggregateTable added
in the prior commit. The hot path in Drainer.accept becomes:
AggregateMetric aggregate = aggregates.findOrInsert(snapshot);
if (aggregate != null) {
aggregate.recordOneDuration(snapshot.tagAndDuration);
dirty = true;
} else {
healthMetrics.onStatsAggregateDropped();
}
On the steady-state hit path the lookup is a 64-bit hash compute + bucket
walk + matches(snapshot) -- no MetricKey allocation, no SERVICE_NAMES /
SPAN_KINDS / PEER_TAGS_CACHE lookups. The canonical MetricKey is now built
once per unique key at insert time, in MetricKeys.fromSnapshot.
Behavioral change in the cap-overrun path
-----------------------------------------
The old LRUCache evicted least-recently-used: at cap, a new insert would
push out the oldest entry regardless of whether it was live or stale.
AggregateTable instead scans for a hitCount==0 entry to recycle, and drops
the new key if none exists. Practical impact: in the common case where
the table holds a stable set of recurring keys, an unrelated burst of new
keys is dropped (and reported via onStatsAggregateDropped) rather than
evicting the established keys. The existing test that asserted "service0
evicted in favor of service10" is updated to assert the new semantics.
The other cap-related test ("should not report dropped aggregate when
evicted entry was already flushed") still passes unchanged: after report()
clears all entries to hitCount=0, the next wave of inserts recycles them.
Threading fix
-------------
ConflatingMetricsAggregator.disable() used to call aggregator.clearAggregates()
and inbox.clear() directly from the Sink's IO event thread, racing with the
aggregator thread mid-write. The race was tolerable for LinkedHashMap; it
is not for AggregateTable (chain corruption can NPE or loop). disable()
now offers a ClearSignal to the inbox so the aggregator thread itself
performs the table clear and the inbox.clear(). Adds one SignalItem
subclass + one branch in Drainer.accept; preserves the single-writer
invariant for AggregateTable end-to-end.
Removed: LRUCache import, AggregateExpiry inner class, the static
buildMetricKey / materializePeerTags / encodePeerTag helpers (now in
MetricKeys).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
MetricKey existed for two reasons -- the prior LRUCache key role (now handled by AggregateTable's Hashtable.Entry mechanics) and as the labels argument to MetricWriter.add. The first is gone; the second is the only thing keeping MetricKey alive. Fold its UTF8-encoded label fields onto AggregateEntry, change MetricWriter.add to take AggregateEntry directly, and delete MetricKey + MetricKeys. What AggregateEntry now holds ----------------------------- - 10 UTF8BytesString label fields (resource, service, operationName, serviceSource, type, spanKind, httpMethod, httpEndpoint, grpcStatusCode, and a List<UTF8BytesString> peerTags for serialization). - 3 primitives (httpStatusCode, synthetic, traceRoot). - AggregateMetric (the value being accumulated). - The raw String[] peerTagPairs is retained alongside the encoded peerTags -- matches() compares it positionally against the snapshot's pairs; the encoded form is only consumed by the writer. matches(SpanSnapshot) compares the entry's UTF8 forms to the snapshot's raw String / CharSequence fields via content-equality (UTF8BytesString.toString() returns the underlying String in O(1)). This closes a latent bug in the prior raw-vs-raw matches(): if one snapshot delivered a tag value as String and a later snapshot delivered the same content as UTF8BytesString, the old Objects.equals would return false and the table would split into two entries. Content-equality matching collapses them into one. Consolidated caches ------------------- The static UTF8 caches that used to live partly on MetricKey (RESOURCE_CACHE, OPERATION_CACHE, SERVICE_SOURCE_CACHE, TYPE_CACHE, KIND_CACHE, HTTP_METHOD_CACHE, HTTP_ENDPOINT_CACHE, GRPC_STATUS_CODE_CACHE, SERVICE_CACHE) and partly on ConflatingMetricsAggregator (SERVICE_NAMES, SPAN_KINDS, PEER_TAGS_CACHE) are all now on AggregateEntry. The split was duplicating work -- SERVICE_NAMES and SERVICE_CACHE both cached service-name to UTF8BytesString. One cache per field now. API change: MetricWriter.add ---------------------------- Was: add(MetricKey key, AggregateMetric aggregate) Now: add(AggregateEntry entry) The aggregate lives on the entry. Single-arg. SerializingMetricWriter reads the same UTF8 fields off AggregateEntry that it previously read off MetricKey; the wire format is byte-identical. Test impact ----------- AggregateEntry.of(...) takes the same 13 positional args new MetricKey(...) took, so test diffs are mostly mechanical: new MetricKey(args) -> AggregateEntry.of(args) writer.add(key, _) -> writer.add(entry) ValidatingSink in SerializingMetricWriterTest now iterates List<AggregateEntry> directly. ConflatingMetricAggregatorTest's Spock matchers (~36 sites) rely on AggregateEntry.equals comparing the 13 label fields (not the aggregate) so the mock matches by labels regardless of the aggregate state at call time; post-invocation closures verify aggregate state. Benchmarks (2 forks x 5 iter x 15s) ----------------------------------- The change is consumer-thread only; producer publish() is unchanged. SimpleSpan bench: 3.123 +- 0.025 us/op (prior: 3.119 +- 0.018) DDSpan bench: 2.412 +- 0.022 us/op (prior: 2.463 +- 0.041) Both within noise -- the win is structural (one less class, one less allocation per miss, one fewer cache layer) rather than benchmarked. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
050a998 to
3738c85
Compare
LongHashingUtilsTest (14 cases):
- hashCodeX null sentinel + non-null pass-through
- all primitive hash() overloads match the boxed Java hashCodes
- hash(Object...) 2/3/4/5-arg overloads match the chained addToHash
formula they are documented to constant-fold to
- addToHash(long, primitive) overloads match the Object-version
- linear-accumulation invariant (31 * h + v) holds across a sequence
- iterable / deprecated int[] / deprecated Object[] variants match
chained addToHash
- intHash treats null as 0 (observable via hash(null, "x"))
HashtableTest (24 cases across 5 nested classes):
- D1: insert/get/remove/insertOrReplace/clear/forEach, in-place value
mutation, null-key handling, hash-collision chaining with disambig-
uating equals, remove-from-collided-chain leaves siblings intact
- D2: pair-key identity, remove(pair), insertOrReplace matches on
both parts, forEach
- Support: capacity rounds up to a power of two, bucketIndex stays
in range across a wide hash sample, clear nulls every slot
- BucketIterator: walks only matching-hash entries in a chain, throws
NoSuchElementException when exhausted
- MutatingBucketIterator: remove from head-of-chain unlinks, replace
swaps the entry while preserving chain, remove() without prior
next() throws IllegalStateException
Tests live in internal-api/src/test/java/datadog/trace/util and use the
already-present JUnit 5 setup.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bring the new util/ files in line with google-java-format (tabs → spaces, line wrapping, javadoc list markup) so spotlessCheck passes in CI. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Compares Hashtable.D1 and Hashtable.D2 against equivalent HashMap usage for add, update, and iterate operations. Each benchmark thread owns its own map (Scope.Thread), but @threads(8) is used so the allocation/GC pressure that Hashtable is designed to avoid surfaces in the throughput numbers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Guard Support.sizeFor against overflow and use Integer.highestOneBit; reject capacities above 1 << 30 instead of looping forever. - Add braces around single-statement while bodies in BucketIterator. - Split HashtableBenchmark into HashtableD1Benchmark / HashtableD2Benchmark. - Add regression tests for Support.sizeFor bounds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The 5-arg Object overload was forwarding only obj0..obj3 to the int overload, silently dropping obj4. Also align LongHashingUtils.hash 3-arg signature with its 2/4/5-arg siblings (int parameters) and strengthen the 5-arg HashingUtilsTest to detect the missing-arg regression. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Split D1Tests and D2Tests into HashtableD1Test and HashtableD2Test; extract shared test entry classes into HashtableTestEntries. - Reduce visibility of LongHashingUtils.hash(int...) chaining overloads to package-private; they are internal building blocks. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The iterator tests need a populated Hashtable.Entry[] to drive Support.bucketIterator / mutatingBucketIterator. Relaxing D1.buckets from private to package-private lets the same-package tests read it directly, removing the reflection helper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The label fields and the mutable counters/histograms are 1:1 with each entry; carrying them on a separate object meant one extra allocation per unique key plus an indirection on every hot-path update. Merging them puts the counters directly on AggregateEntry, drops the entry.aggregate hop, and consolidates ERROR_TAG / TOP_LEVEL_TAG onto the same class the consumer uses to decode them. AggregateTable.findOrInsert now returns AggregateEntry. Callers in Aggregator and SerializingMetricWriter updated. Migrated AggregateMetricTest.groovy to AggregateEntryTest.java per project policy. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add a context-passing forEach(T, BiConsumer) overload to AggregateTable, mirroring TagMap's pattern. Aggregator.report now hands the writer in as context to a static BiConsumer so no fresh Consumer is allocated each report cycle. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors the TagMap pattern: pairs the existing forEach(Consumer) with a forEach(T context, BiConsumer<T, TEntry>) overload so callers can hand side-band state to a non-capturing lambda and avoid the fresh-Consumer-per-call allocation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Factors the unchecked (TEntry) cast out of D1.forEach / D2.forEach (and the BiConsumer variants) into Support.forEach(buckets, ...). The cast now lives in one place, mirroring how Entry.next() handles it, and the D1/D2 methods become one-liners. Downstream higher-arity tables built on Support gain the same helper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Now that Hashtable.Support exposes the parameterized forEach helpers, AggregateTable's own forEach methods can drop their duplicated loop body and the (AggregateEntry) cast. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This method was a vestige of master's Batch design where multiple producer threads wrote into an AtomicLongArray slot concurrently and the aggregator drained ~64 durations per Batch in one call. The new producer/consumer split publishes one SpanSnapshot per span, so production only ever calls recordOneDuration(long). Migrate the three remaining callers (AggregateEntryTest, SerializingMetricWriterTest, MetricsIntegrationTest) to a loop of recordOneDuration(long) calls, then delete the batched method and its AtomicLongArray imports. Drops the recordDurationsIgnoresTrailingZeros test -- that behavior was a specific quirk of the batched API (count parameter shorter than the array length) and doesn't apply to recordOneDuration. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The factory recovers (name, value) pairs from pre-encoded "name:value" strings by splitting at the FIRST colon. Test-only, but worth being explicit so callers don't hand it a peer-tag value containing a colon (URLs, IPv6, service:env) and get a silently wrong (name, value) pair. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The bundled fix in this PR routes the agent-downgrade clear through the inbox so the aggregator thread stays the sole writer to AggregateTable. Prior to this test, there was no regression coverage for that routing. The test fires DOWNGRADED from the test thread (production-like OkHttpSink callback path), waits for the immediate no-flush window, then publishes a marker span with a distinct resource name. The subsequent report's writer.add captor must see only the marker -- if CLEAR didn't actually wipe the original entry, the original "resource" would still be present and the assertion would catch it. Cannot directly verify thread identity of the clear from inside this test (CLEAR's inbox.clear() drops any latch signal we'd queue behind it), so this is an observable-contract test rather than a strict thread-id test. Still catches both the missing-clear regression and the bucket-chain-corruption regression that the original threading race could produce. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…optimize-metric-key
…dinality Resolved conflicts: - AggregateEntry.java: kept #11387's cardinality-handler constructor shape; dropped the dead recordDurations(int, AtomicLongArray) batch API and its AtomicLongArray import (the #11382 cleanup). The of() colon-split warning from #11382 doesn't apply because #11387's of() takes peerTags as a pre-built list (no colon-splitting on test reconstruction). - ClientStatsAggregator.java: kept #11387's PeerTagSchema.of(..., healthMetrics) 3-arg signature; preserved #11382's read-order race fix (lastTimeDiscovered read before peerTags); preserved the resetCardinalityHandlers method from HEAD. - ClientStatsAggregatorTest.groovy: kept HEAD's cardinality-focused tests; applied the #11382 Spock-assert fix (wrap '>> { closure }' body expressions in assert) to all 31 sites so power-assert surfaces mismatches. - ClientStatsAggregatorBootstrapTest.java: renamed the reconcileSwapsSchemaWhenTagSetChanges test's ConflatingMetricsAggregator references to ClientStatsAggregator. - ConflatingMetricsAggregatorDisableTest.java: renamed file + references to ClientStatsAggregatorDisableTest with corresponding class-name change. - AdversarialMetricsBenchmark.java: updated ConflatingMetricsAggregator + ConflatingMetricsAggregatorBenchmark references to ClientStatsAggregator / ClientStatsAggregatorBenchmark. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ory-efficiency Resolved conflicts: - AdversarialMetricsBenchmark.java: kept #11389's Javadoc (specific to AggregateTable + cardinality handlers), applied #11382's LongAdder + concise tearDown cleanup. - AggregateEntry.java: dropped recordDurations(int, AtomicLongArray) per the #11382 cleanup (no longer used after migrating the test callers to recordOneDuration). - MetricsIntegrationTest.groovy: migrated to per-duration recordOneDuration calls. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| private AggregateEntry(SpanSnapshot s, long keyHash) { | ||
| super(keyHash); | ||
| this.resource = canonicalize(RESOURCE_CACHE, s.resourceName); | ||
| this.service = SERVICE_CACHE.computeIfAbsent(s.serviceName, UTF8_ENCODE); |
There was a problem hiding this comment.
before the service name was also set to EMPTY when null. I think this in reality cannot happen but perhaps it's worth going through canonicalize also for this
There was a problem hiding this comment.
Yes, I need to revisit the null vs EMPTY story. As I recall the null handling was already a mix of two approaches before, but we should fix it.
There was a problem hiding this comment.
From Claude (posted on dougqh's behalf):
Done in d1749389b9. service and spanKind now both go through canonicalize like every other label field — null maps to EMPTY instead of NPE'ing at the cache call. Dropped the Functions.UTF8_ENCODE import too since the canonicalize helper handles the create.
Pinned by AggregateTableTest.nullServiceAndSpanKindDoNotNpeAndCollapseWithEmpty which publishes (null, null), (null, null) again, and ("", ""), asserts a single entry and length-0 service/spanKind. Without the fix, the first publish would NPE'd at .computeIfAbsent.
| } | ||
| } | ||
|
|
||
| public void clearAggregates() { |
There was a problem hiding this comment.
this looks now a dead method. Given that is not thread safe wrt the underlying hashtable I would remove it or reduce its visibility
There was a problem hiding this comment.
From Claude (posted on dougqh's behalf):
Deleted in 21e75452d6 — the method had no callers after the disable() rewrite, and you're right that exposing a thread-unsafe mutator from outside the inbox routing would invite the exact race that motivated the ClearSignal path. The comment in ConflatingMetricsAggregator that named it has been updated to describe the inbox-routed clear instead.
| h = LongHashingUtils.addToHash(h, s.synthetic); | ||
| h = LongHashingUtils.addToHash(h, s.traceRoot); | ||
| h = LongHashingUtils.addToHash(h, s.spanKind); | ||
| if (s.peerTagSchema != null && s.peerTagValues != null) { |
There was a problem hiding this comment.
Minor: this null check makes the equals not symmetrical since we're skipping here the field from including into the hash - also combinations with different nulls on peerTagSchema/peerTagValues can produce the same hash theoretically
There was a problem hiding this comment.
I think at this point I should just get rid of equals & hashCode. They aren't really used by the new table structure.
As for the nulls in the hash code, yes, that can lead to hash collisions, but that's fine.
That said, I do think I'll rework the hashOf routine to be a bit lighter weight. I may or may not change the null handling when I do so.
There was a problem hiding this comment.
From Claude (posted on dougqh's behalf):
Both halves of this are now fixed:
-
Hash side (
80778c4e87): dropped the conditional null-skip.hashOfnow unconditionally mixes inLongHashingUtils.addToHash(h, s.peerTagSchema)andArrays.hashCode(s.peerTagValues).PeerTagSchema.hashCode()is content-based on names (lazily cached), so distinct schema instances with identical names still produce a stable hash. Different-null combinations on the two fields can no longer collide. -
Equals side (
2536aa2e76):AggregateEntry.equalsnow compares the rawpeerTagNames/peerTagValuesarrays viaArrays.equals, mirroring exactly whathashOfhashes. Same forbe13431744onPeerTagSchema(symmetric equals/hashCode pair onnames).
Two regression tests pin the contract: AggregateEntryTest.equalsConsistentWithHashCodeAcrossDifferentSchemaLayouts (entries with ["a","b"]/[null,"x"] vs ["b","c"]/["x",null] — same encoded peer-tags ["b:x"] but different raw layouts → not equal, different hashCodes) and equalEntriesHaveEqualHashCodes (the positive case).
| * persistently at cap, this is the place to consider caching a cursor across calls so the scan | ||
| * resumes where it left off. | ||
| */ | ||
| private boolean evictOneStale() { |
There was a problem hiding this comment.
What highlighted in the comment is true. However if I'm not wrong when the aggregator start dropping (meaning that capacity is full and badly dimensioned) this will translate in a perf kill. I'm wondering if it's worth adding a field for the last scanned bucket to have the cost amortized right now
There was a problem hiding this comment.
Can you clarify "perf kill"? Yes, this will end up doing an O(n) scan of the table, but the table is small.
Yes, that will slow down the metrics thread, but it won't kill the application.
I've primarily focused on making sure things don't fall over under an adversarial workload. In those cases, such last used caches don't work. Although, I suppose I could check a more realistic accidental case where only a single component has out of control cardinality.
There was a problem hiding this comment.
was about needing to scan the full table for each span inserted that at the end cannot insert since will overflow. but perhaps I've misinterpret something
There was a problem hiding this comment.
Okay, I'll double check the eviction logic. That is worth being careful about.
Although, the next part that introduces cardinality limiters largely renders the table limit unnecessary.
There was a problem hiding this comment.
I updated the eviction to track the last bucket from one scan to the next.
To do that, I added a variation of the Hashtable iterator that scans a range of buckets.
Since only this code is using Hashtable right now, I went ahead and included that in this PR
| aggregates.clear(); | ||
| inbox.clear(); | ||
| } | ||
| ((SignalItem) item).complete(); |
There was a problem hiding this comment.
should the dirty flag be also cleared up here?
There was a problem hiding this comment.
Good question. I think I need to go back and review all the signaling a bit more.
There were some latent problems that already existed in the code that have been exposed during the reviews.
There was a problem hiding this comment.
Added in the clearing of the dirty flag.
I wonder if that was a latent bug because I mostly intended just to shuffle around the existing code. It might have been that we were just getting a spurious reporting cycle previously.
Anyway, it is fixed now.
The constructor canonicalizes null fields through canonicalize() which
returns UTF8BytesString.EMPTY for null inputs (or a cached
UTF8BytesString("") for empty-string inputs). But matches() compared
those entries against subsequent snapshots via contentEquals(...) /
stringContentEquals(...), which treated non-null UTF8BytesString vs
null CharSequence as inequal.
Result: two snapshots with the same null-valued resource/operation/
type/serviceSource hashed to the same bucket (intHash(null) == 0 ==
"".hashCode()), but matches() returned false on the EMPTY-vs-null
field comparison, so the second snapshot inserted a *duplicate*
entry into the table. Same path for empty-string vs null.
Unify the semantics: null and length-zero are treated as equivalent
on either side of contentEquals/stringContentEquals. The hash already
agreed (intHash(null) == "".hashCode() == 0), so this restores the
matches() contract to match the existing hash contract.
Adds AggregateTableTest.nullAndEmptyOptionalFieldsCollapseToOneEntry
to pin the contract: two null-fielded and one empty-string-fielded
snapshot must all hit the same entry. Test would have failed before
the fix (a duplicate insert) but the existing 10 cases still pass.
Resolves sarahchen6's review comment on AggregateEntry.java:113 and
amarziali's related concern on AggregateEntry.java:114.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After CLEAR runs the table is empty but dirty would still carry over from any prior SpanSnapshot insert. The next report() would see dirty=true, expunge no-op the empty table, find isEmpty(), and log "skipped metrics reporting because no points have changed" -- same observable outcome, but resetting dirty here keeps the invariant "dirty implies there's data to flush" honest. Resolves amarziali's review comment on Aggregator.java:121. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously hashOf wrapped the peer-tag contribution in `if (s.peerTagSchema != null && s.peerTagValues != null)`. That meant two snapshots with different null arrangements (schema-null vs values-null) collapsed to the same hash, getting resolved only by the field-by-field matches() fallback at the bucket walk -- wasteful, and the asymmetry hurt hash quality generally. Replace with unconditional contributions: - PeerTagSchema now overrides hashCode() to be content-based on names (lazy + cached, benign-race pattern matching UTF8BytesString / utf8Bytes elsewhere). addToHash(h, schema) routes through that. - For the String[] values, pass Arrays.hashCode(values) through the int overload -- Object[].hashCode() is identity-based by default, so we have to compute content hash explicitly. Null arrays hash to 0 via Arrays.hashCode's contract. Null inputs on either side now hash to 0 distinctly from any real schema or non-empty values array, so all four null combinations are distinguishable. Same final hash for content-equal inputs across schema replacements (the reconcile path), which preserves the entry- hit invariant after the aggregator rebuilds the schema. Resolves amarziali's review comment on AggregateEntry.java:309 and dougqh's suggestion on AggregateEntry.java:310. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Once the ClearSignal routing replaced the direct disable()-to-table mutation, clearAggregates() lost all its call sites -- no production code, no test code. Worse, leaving it public invited future callers to bypass the ClearSignal contract and race against Drainer.accept on the aggregator thread. Drop the method outright. Update the inline comment in ConflatingMetricsAggregator.disable() to not name the deleted method. Resolves amarziali's review comment on Aggregator.java:82. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…terator Previously AggregateTable.evictOneStale walked the bucket array from bucket 0 on every call. Under sustained cap pressure with mostly-hot entries clustered in low buckets, every eviction re-scanned the same hot prefix before finding a cold entry. amarziali's review concern. Add a cursor: after a successful eviction, remember the bucket where it landed. The next call resumes from there. Worst case for a single call is still O(N) when nearly every entry is hot, but a sustained eviction stream amortizes to O(1) per call -- the hot prefix is never re-scanned more than twice across N evictions. Implemented as two iterators driving [cursor, length) then [0, cursor), which required a small Hashtable.Support API addition: - New `mutatingTableIterator(buckets, startBucket, endBucket)` overload for walking a half-open bucket range. The existing zero-arg overload is kept; it now delegates to the new ctor with [0, buckets.length). - New `MutatingTableIterator.currentBucket()` accessor exposing the bucket index of the entry last returned by next() (or -1 before any next/after a remove). AggregateTable saves this as the new cursor. - The empty-range case (startBucket == endBucket) yields an immediately-exhausted iterator -- this is what makes the wrap-around pass [0, cursor) naturally produce nothing when cursor == 0, so the two-pass driver in evictOneStale needs no special case. Tests: - 4 new HashtableTest cases covering the half-open API, empty ranges, out-of-range bounds, and currentBucket() behavior before/after next. - 2 new AggregateTableTest cases: backToBackEvictionsAllSucceed (drives 3x capacity worth of cap-overrun inserts; each must succeed, which only holds if the cursor advances correctly) and clearResetsCursorForSubsequentEvictions (clear() also resets the cursor so subsequent eviction passes start from bucket 0). Resolves amarziali's review comment on AggregateTable.java:75. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dougqh's review comment on AggregateEntry.java:153 asked to keep test code out of the production class. Move the factory to a new AggregateEntries helper in src/test/java/datadog/trace/common/metrics. Same package so it can call the package-private forSnapshot(); delegating to forSnapshot also means no need to widen the AggregateEntry constructor visibility. The 37 src/test/groovy call sites get a mechanical rewrite of AggregateEntry.of(...) -> AggregateEntries.of(...) (36 in ConflatingMetricAggregatorTest, 1 in SerializingMetricWriterTest). src/traceAgentTest is a separate source set without compile-time visibility into src/test, so its 2 MetricsIntegrationTest.groovy call sites can't use AggregateEntries. Migrated those to construct a SpanSnapshot inline + call AggregateEntry.forSnapshot(snapshot). Groovy's permissive package-private access makes this work from the default package the integration test currently sits in. Resolves dougqh's review comment on AggregateEntry.java:153. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
equals compared the pre-encoded peerTags List<UTF8BytesString> while
hashCode (via hashOf) mixes in the raw peerTagSchema + values arrays.
Two entries built from different schema layouts can collapse to the
same encoded form -- e.g. tag "b" at index 1 in schema {a,b} with
values {null,"x"} produces the same encoded ["b:x"] as schema {b,c}
with values {"x",null}. equals returned true; hashCodes differed.
Hashcode contract violated.
Switch equals to compare the raw peerTagNames + peerTagValues arrays,
mirroring matches(SpanSnapshot) and hashOf(SpanSnapshot). The
production lookup path (AggregateTable.findOrInsert) already uses
those, so this just brings equals in line with the rest of the class.
Adds two regression tests on AggregateEntryTest:
- equalsConsistentWithHashCodeAcrossDifferentSchemaLayouts: the
failing-case shape above. Pre-fix, the encoded-list equals returned
true while hashCodes differed; now equals returns false and the
hashCodes differ in agreement.
- equalEntriesHaveEqualHashCodes: positive case -- two entries from
identical snapshots must equal and share hashCode.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Prior CLEAR handler called inbox.clear() as belt-and-suspenders cleanup of in-flight snapshots. That would also erase any STOP signal queued behind CLEAR -- a real concern in disable() -> close() sequences, where the trampled STOP leaves the aggregator thread spinning until thread.join's timeout. sarahchen6 surfaced this from a Codex pass on the CLEAR logic; dougqh confirmed it's worth fixing. The CLEAR handler now clears only the aggregates table. Queued snapshots will drain naturally into the just-cleared table -- but since features.supportsMetrics() is already false by the time CLEAR was offered, producers have stopped publishing; the inbox drains and empties on its own. Worst case: one extra reporting cycle of wasted work on stale snapshots that the agent rejects, which triggers another DOWNGRADED -> disable() -> CLEAR. Self-healing, same as before. Adds ConflatingMetricsAggregatorDisableTest.clearDoesNotTrampleQueuedStopSignal: publish a snapshot, fire DOWNGRADED, call close(); the test bounds close() with its own 2s timeout and asserts the thread exits within it. Pre-fix this would have hung out THREAD_JOIN_TIMEOUT_MS; post-fix it returns in milliseconds. Resolves sarahchen6/Codex's CLEAR-trampling-STOP review comment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The prior commit added a content-based hashCode() but left equals falling back to Object.equals (reference identity). That violates the hashCode contract for any caller that compares two distinct schema instances built from the same tag list -- e.g. before/after a reconcile rebuilds the cached schema with an unchanged tag set. equals() now mirrors hashCode(): content-equal on names. The reconcile- timing field lastTimeDiscovered is intentionally excluded from both -- it's bookkeeping for the aggregator's discovery-version compare, not part of schema identity. Tests: - equalsIsContentBasedOnNames -- same names, two instances, equal + matching hashCode. - equalsIgnoresLastTimeDiscovered -- pins that the bookkeeping field doesn't leak into identity. - equalsDistinguishesByOrder -- names is positional (pairs with SpanSnapshot.peerTagValues by index), so reordered schemas are not interchangeable. - equalsHandlesNullAndOtherTypes -- contract corners. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
AggregateEntry's constructor canonicalized resource, operationName,
type, and serviceSource (mapping null -> EMPTY via the canonicalize
helper) but called SERVICE_CACHE.computeIfAbsent / SPAN_KIND_CACHE
.computeIfAbsent directly for service and spanKind. Inputs of null
would NPE on the cache call.
Production paths never pass null for these -- DDSpan always supplies
a service, and the producer defaults spanKind to "" via
unsafeGetTag(SPAN_KIND, (CharSequence) "") -- so this is a latent-
defense fix, not a live bug. But the matches/contentEquals logic
already treats null and length-zero as equal on both sides, and every
other label field in the constructor defends via canonicalize. Two
unprotected outliers are an inconsistency that bites the next person
who reaches for a new code path.
Drops the Functions.UTF8_ENCODE import (its sole use was the service
cache line) -- canonicalize internally creates the UTF8BytesString.
Test: AggregateTableTest.nullServiceAndSpanKindDoNotNpeAndCollapseWithEmpty
publishes (null, null), (null, null) again, and ("", ""); asserts a
single entry results and that getService()/getSpanKind() are length-0.
Without the fix, the first publish would have NPE'd at the
.computeIfAbsent call.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ound-work' into dougqh/optimize-metric-key # Conflicts: # dd-trace-core/src/main/java/datadog/trace/common/metrics/PeerTagSchema.java # dd-trace-core/src/test/java/datadog/trace/common/metrics/PeerTagSchemaTest.java
| return aStr.contentEquals(b); | ||
| } | ||
|
|
||
| private static boolean stringContentEquals(UTF8BytesString a, String b) { |
There was a problem hiding this comment.
can remove this method and just use contentEquals above since String is a CharSequence?
| * subsequent snapshot whose field is still null. {@code intHash(null) == 0 == "".hashCode()}, so | ||
| * the hash already agrees with this view. | ||
| */ | ||
| private static boolean contentEquals(UTF8BytesString a, CharSequence b) { |
There was a problem hiding this comment.
nit: this method could be simplified to something like
private static boolean contentEquals(UTF8BytesString a, CharSequence b) {
if (a == null || a.length() == 0) {
return b == null || b.length() == 0;
}
return b != null && a.toString().contentEquals(b);
}
| * reach {@link AggregateEntry#forSnapshot(SpanSnapshot)} and the package-private {@link | ||
| * SpanSnapshot} constructor. | ||
| */ | ||
| public final class AggregateEntries { |
There was a problem hiding this comment.
since this is now test-specific, it would make sense to me to have a more test-related name, instead of reading nearly identical to AggregateEntry?
| // ""), but the prior contentEquals impl treated `non-null vs null` as not-equal -- so a second | ||
| // snapshot with the same null fields hashed to the same bucket but failed matches(), causing a | ||
| // spurious duplicate insert. The fix unifies null and length-zero on both sides of | ||
| // contentEquals/stringContentEquals. |
There was a problem hiding this comment.
nit: I think this comment could be removed. It seems to explain a fix implemented during the PR iteration process and how this test case addresses that fix, but the test case can stand on its own, i.e. future code readers only need to know what the final code expected behavior is.
this may apply to the comment below too for nullServiceAndSpanKindDoNotNpeAndCollapseWithEmpty()
| // Regression: prior CLEAR handler called inbox.clear(), which would erase any STOP signal | ||
| // queued behind it. close() then waited out thread.join's timeout because Drainer never saw | ||
| // the STOP and `stopped` was never set. Now the CLEAR handler clears only the aggregates | ||
| // table; queued signals (STOP, REPORT) survive and get processed normally. |
There was a problem hiding this comment.
same here... maybe keeping the comment as just:
CLEAR handler clears only the aggregates table; queued signals (STOP, REPORT) survive and get processed normally.
basically removing references to the PR iteration process and keeping just the expected behavior / objectively helpful context in comments
| PeerTagSchema b = PeerTagSchema.testSchema(new String[] {"peer.hostname", "peer.service"}); | ||
|
|
||
| assertEquals(a, b); | ||
| assertEquals(b, a); |
There was a problem hiding this comment.
nit: this line seems unnecessary given 100 and 102 🤔
There was a problem hiding this comment.
Left a few nits, but otherwise looks reasonable!!
Also this was Claude's advice for the failing agent integration test:
Fix options (in order of invasiveness):
1. Add a public test-support module — extract a small AggregateEntryTestSupport factory into a test-support
jar that traceAgentTest can depend on, keeping the production classes package-private.
2. Make testSchema + forSnapshot public on their classes — simplest, but widens the API surface of
intentionally-internal classes.
3. Rewrite the test to not reference these classes directly — drive the test entirely via
ConflatingMetricsAggregator publishing real DDSpans, which avoids any dependency on the internal
AggregateEntry/PeerTagSchema types.
Option 3 is the cleanest because it also makes the integration test more realistic (it exercises the full
pipeline instead of bypassing the producer path).
What Does This Do
Replaces the MetricKey based HashMap with a new AggregateTable based on the light Hashtable
Motivation
By using the light Hashtable, I'm able to avoid the biggest source of allocation in client-side stats: MetricKey
Hashtable provides utilities for searching the entries without constructing a new composite key object
First, the components are hashed together to find the corresponding bucket
Then the bucket can be traversed to see if the entries match the key components
And the custom Entry can hold multiple fields that comprise the data / metadata needed for metric aggregation and eviction policy
The end result is that both MetricKey and AggregateMetric can be merged into a single class AggregateEntry that is only constructed when there's no existing matching entry
Additional Notes
Stacked on top of #11381 (
dougqh/conflating-metrics-background-work). #11409 — theHashtableutility this PR builds on — landed in master separately, so the diff shown here is only the AggregateTable / AggregateEntry / ClearSignal work plus follow-up cleanups.Restructures the consumer-side aggregate store. Three logical commits, intended to be reviewed in order:
1. Add
AggregateTable+AggregateEntrybacked byHashtableIntroduces a multi-key hash table that lets the consumer thread look up the {labels → counters} entry directly from a
SpanSnapshot's raw fields — noMetricKeyallocation per snapshot, no per-snapshot UTF8 cache lookups, no CHM operations. Hot-path lookup iskeyHash compute→Hashtable.Support.bucket→ bucket walk →matches(keyHash, snapshot)→ returned entry has the counters to mutate in place.This commit is standalone — no call sites yet, only the new classes + unit tests for hit/miss/cap-overrun/expunge/clear behavior.
2. Swap
Aggregatorto useAggregateTable+ routedisable()clear through aClearSignalReplaces
LRUCache<MetricKey, AggregateMetric>withAggregateTableinAggregator. Drops theAggregateExpirylistener — drop reporting (onStatsAggregateDropped) moves to the cap-overrun path insideDrainer.accept.Threading fix bundled here:
ConflatingMetricsAggregator.disable()used to callaggregator.clearAggregates()andinbox.clear()directly from the Sink's IO callback thread, racing with the aggregator thread. That race was tolerable forLinkedHashMap(worst case = corrupted internal state right before everything got cleared anyway); it's not tolerable forHashtable(chain corruption can NPE or loop).disable()now offers aClearSignalto the inbox so the aggregator thread itself performs the clear — preserves the single-writer invariant forAggregateTableend-to-end. The offer is best-effort; the system self-heals on a subsequent downgrade cycle if the inbox happens to be full (commented at the call site).Cap-overrun semantic change: the old
LRUCacheevicted least-recently-used in O(1).AggregateTableinstead scans for ahitCount==0entry to recycle (O(N) worst-case), and drops the new key if none exists. Practical impact: in steady state, an unrelated burst of new keys gets dropped (and reported viaonStatsAggregateDropped) rather than evicting established keys. The cost trade-off is commented at the eviction site — eviction is expected rare because the cap is sized to the working set; cursor-caching is the future option if a workload runs persistently at cap. The existing test that asserted "service0 evicted in favor of service10" is updated to assert the new semantics; the other cap-related test ("evicted entry was already flushed") still passes unchanged.3. Fold
MetricKey+AggregateMetricintoAggregateEntryMetricKeyexisted for two reasons — being theLRUCachekey (replaced byAggregateTable's Hashtable mechanics) and being the labels arg toMetricWriter.add(the only thing left).AggregateMetricwas the counter/histogram counterpart. Folds both onto a singleAggregateEntry(10 UTF8 label fields + 3 primitives + counters + histograms), changesMetricWriter.add(MetricKey, AggregateMetric)→add(AggregateEntry), and deletesMetricKey.java+MetricKeys.java+AggregateMetric.java.The 12 UTF8 caches that used to be split between
MetricKey(9) andConflatingMetricsAggregator(3, with overlap) are consolidated onAggregateEntry. One cache per field type now.Latent bug fix: the prior
matches(SpanSnapshot)usedObjects.equalson raw fields. If the same logical key was delivered once asStringand once asUTF8BytesString(differentCharSequenceimpls of identical content),Objects.equalsreturned false and the table would split into two entries for the same key. The newmatchesuses content-equality (UTF8BytesString.toString()returns the underlyingStringin O(1)), collapsing them correctly.Test impact:
AggregateEntry.of(...)mirrors the priornew MetricKey(...)positional args, so test diffs are mostly mechanical. About 56 test sites migrated acrossConflatingMetricAggregatorTest,SerializingMetricWriterTest, andMetricsIntegrationTest.Review polish
Follow-up commits address review feedback:
Hashtable.Support.create(maxAggregates, Support.MAX_RATIO)+Support.bucket+Support.insertHeadEntry(buckets, keyHash, entry)+Support.mutatingTableIteratorto delegate to the helpers added on Add Hashtable and LongHashingUtils utilities #11409 — drops ~50 lines of bespoke bucket-array code.BiConsumerconstant (the JIT reuses non-capturing lambdas).AggregateEntry.matches(long keyHash, SpanSnapshot)overload that pre-checks the hash, so chain walks read as one call.@Nullable(javax.annotation) annotations on the four nullable label fields + their getters +of(...)parameters.Objects.equalsimport inAggregateEntry.equals()(no more fully-qualified refs).evictOneStale(O(N) scan rationale) anddisable()(best-effort offer rationale).Additional cleanups
A second round of review surfacing landed these:
.aggregateruntime bug — the legacyentry.aggregate.recordDurations(...)form compiled under Groovy's dynamic dispatch but would have thrownMissingPropertyExceptionat runtime. Fixed to callrecordDurationsdirectly on the entry. (Bot-flagged.)>> { closure }no-op assertions inConflatingMetricAggregatorTest— the>>operator stubs a return value, so the closures verifyinge.getHitCount() == X && ...were being evaluated and discarded. Wrapped 31 sites inassertso Groovy power-assert surfaces mismatches. All 41 tests still pass, so the previously-unverified assertions happened to hold. (Bot-flagged.)recordDurations(int, AtomicLongArray)batch API — vestige of master'sBatchdesign. Production now only callsrecordOneDuration(long). Migrated the three remaining test callers (AggregateEntryTest,SerializingMetricWriterTest,MetricsIntegrationTest) to loops ofrecordOneDurationcalls, then deleted the batched method and itsAtomicLongArrayimports.AggregateEntry.of()colon-split Javadoc warning — the test factory recovers (name, value) pairs from"name:value"strings by splitting at the first:, which is brittle if a peer-tag value contains a colon (URLs, IPv6,service:env). Added an explicit warning so callers know to keep test data colon-free in values.ConflatingMetricsAggregatorDisableTest— new JUnit 5 coverage for thedisable() → ClearSignalthreading routing. The test firesDOWNGRADEDfrom the test thread, waits for the no-flush window, then publishes a marker span with a distinct resource name and asserts the next flush captures only the marker — proving CLEAR actually wiped the original entry from the table. Catches both the missing-clear regression and the bucket-chain-corruption regression that the original threading race could produce.Benchmarks
Producer
publish()latency (single-threaded, 2 forks × 5 iter × 15s)All within noise on the producer side — this PR is a consumer-side refactor, so producer
publish()shouldn't move much. The structural wins (one less class, no per-snapshotMetricKeyallocation on the consumer, no double-cache lookups, smaller per-entry footprint) only become visible when the consumer is hammered hard enough that snapshot processing rate matters. That's exactly what the adversarial bench measures.AdversarialMetricsBenchmark(8 producer threads, 2×15s warmup + 5×15s, 1 fork)Same benchmark used on #11381 (high-cardinality
(service, operation, resource, peer.hostname)per op, random durations across 1ns–1s, random error/topLevel flags — designed to saturate every capacity bound at once).onStatsInboxFull(drops at handoff)onStatsAggregateDropped~70× faster than master, ~5.7× faster than #11381 alone. The producer fast-path drop check (
inbox.size() >= inbox.capacity()) returns immediately when the inbox is saturated, so when the consumer can drain ~5× faster (this PR's consumer-side win), the producer spins throughpublish()at the fast-path rate and inbox-full drops climb correspondingly (200M on #11381 → 2.9 B here). That's the design working as intended: under attack, load shedding happens at the cheapest place in the pipeline.The aggregate-cache drop count actually fell (84M → 16M) because the faster consumer keeps the table from staying at cap as often — fewer snapshots arrive at a cap-overrun state with no stale entry to recycle.
Per-iteration shape stays flat (27.4M–28.1M ops/s, no warmup → measurement degradation), confirming the design holds steady-state under sustained load.
Cardinality-isolation companions (8 producer threads, 2×15s warmup + 5×15s, 1 fork)
HighCardinalityResourceMetricsBenchmarkandHighCardinalityPeerMetricsBenchmark(added in #11381) pin every dimension except one to attribute throughput deltas to a specific axis.HighCardinalityResourceMetricsBenchmark— onlyresourcevaries (~1M distinct),service/operation/peer.hostnamepinned:onStatsInboxFullonStatsAggregateDroppedHighCardinalityPeerMetricsBenchmark— onlypeer.hostnamevaries (~32K distinct),service/operation/resourcepinned:onStatsInboxFullonStatsAggregateDropped+454 % over master / +214 % over #11381 on the resource axis; +384 % / +231 % on the peer axis. The pattern is consistent with the adversarial run: master is sensitive to which dimension is wide (resource at ~1M hits
pending/keysCHM contention harder than peer.hostname at ~32K); this PR runs at ~28–30 M ops/s on either bench because the consumer is no longer the bottleneck. Variance also collapses (±486K / ±1.1M here vs ±2.7M / ±2.9M on #11381), reflecting that the new hashtable hits steady-state immediately with no JIT warmup tail.The drop-counter shape on this PR flips even further toward inbox-full (170× ratio against aggregate-dropped on both benches), confirming that the consumer drains so fast the LRU aggregate cache barely needs to evict.
Net code delta: +1280 / −903 = +377 lines across 16 files. The growth is dominated by new test coverage (
AggregateTableTest,AggregateEntryTest) plus the consolidated UTF8 caches landing onAggregateEntry; the production-code core (lessMetricKey+MetricKeys+AggregateMetricminusAggregateEntry's additions) is roughly flat.Test plan
./gradlew :dd-trace-core:test --tests 'datadog.trace.common.metrics.*'passes (incl. the newAggregateTableTestandAggregateEntryTest)./gradlew :dd-trace-core:compileJava :dd-trace-core:compileTestGroovy :dd-trace-core:compileJmhJava :dd-trace-core:compileTraceAgentTestGroovyall green./gradlew spotlessCheckcleanstats.dropped_aggregatessemantics at high cardinality (especially the new "drop new on cap overrun" path vs. the old "evict LRU" path)🤖 Generated with Claude Code