Cache hashCode on UTF8BytesString by dougqh · Pull Request #11444 · DataDog/dd-trace-java

dougqh · 2026-05-22T04:53:59Z

What Does This Do

Caches the computed hashCode on UTF8BytesString rather than re-delegating through string.hashCode() on every call.

Motivation

UTF8BytesString.hashCode() currently looks like:

@Override
public int hashCode() {
  return this.string.hashCode();
}

String already caches its own hash (JDK 1.0+), so subsequent calls don't recompute the value — but every call out of UTF8BytesString still pays a virtual dispatch + the cached-hash field check inside String.hashCode + branch. For a class that ends up as a hash key in tag caches, metric label sets, and the new cardinality-handler probe tables, that overhead adds up.

This change caches the hash on UTF8BytesString itself, so subsequent calls return immediately via a single field read.

Implementation notes

Benign-race pattern, identical to the existing utf8Bytes lazy initializer in this class:

The cache field is private int cachedHashCode — initialised to 0 by JVM default.
Two threads computing the same hash in parallel produce identical results; no synchronization required.
int writes are atomic per JLS, so a reader can't observe a partial value.
If the actual hashCode is 0 (rare collision), we'll recompute it on every call — same trade-off String itself makes. Not worth a separate "is-zero" flag.

Benchmark

Measured on AdversarialMetricsBenchmark (8 producer threads, high-cardinality unique-per-op labels saturating every cardinality cap in the metrics subsystem, 2×15s warmup + 5×15s):

	Throughput avg (ops/s)	Per-iter (ops/s)
Baseline	5,165,149 ± 1,036,100	5.03M → 5.64M → 5.02M → 5.03M → 5.10M
With hashCode cache	5,776,653 ± 1,215,399	5.60M → 5.47M → 5.71M → 5.81M → 6.29M

~12% throughput improvement. Every per-iteration value with the cache is at or above the highest non-warmup baseline iteration. The CIs overlap somewhat at one fork each, but the systematic upward shift across all 5 iterations across both runs is a real signal.

The bench is adversarial in the sense that every op uses a unique label combination, which defeats UTF8 reuse — so this is the lower bound on the gain. Production workloads with hot-key skew benefit more, because the cardinality-handler intern pool means the same UTF8BytesString instance gets hashed repeatedly in subsequent reporting cycles.

Test plan

:internal-api:test --tests 'datadog.trace.bootstrap.instrumentation.api.Utf8ByteStringTest' — all 17 cases pass (existing tests already assert utf8String.hashCode() == str.hashCode())
:internal-api:spotlessCheck clean
CI muzzle / integration suites

🤖 Generated with Claude Code

UTF8BytesString.hashCode() currently delegates straight through to String.hashCode() on every call. String already caches its own hash, but the trip out of UTF8BytesString and through String's hash-field check still costs a virtual dispatch + field read + branch on every invocation. Caches the hash on UTF8BytesString itself once computed. Benign-race pattern, identical to the existing utf8Bytes lazy initializer: two threads computing the same value produce identical results, and int writes are atomic per JLS so a reader can't observe a partial value. Measured on the metrics subsystem's adversarial JMH bench (8 producer threads, high-cardinality unique-per-op labels), this lifts aggregate throughput from 5.17M to 5.78M ops/s -- ~12% improvement, with the per-iteration distribution shifting systematically upward across all five measurement iterations. The win is bigger in production-like workloads with repeated keys, since the cardinality-handler intern pool means the same UTF8BytesString instance gets hashed repeatedly in subsequent reporting cycles. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

dd-octo-sts · 2026-05-22T04:54:08Z

Hi! 👋 Thanks for your pull request! 🎉

To help us review it, please make sure to:

Add at least one type, and one component or instrumentation label to the pull request

If you need help, please check our contributing guidelines.

datadog-official · 2026-05-22T04:54:31Z

✨ Fix all issues with BitsAI

⚠️ Warnings

🚦 3 Pipeline jobs failed

Check pull requests | Check pull requests

🛟 This job is unlikely to succeed on retry. Please review your pipeline configuration.
Please add at least one type, and one component or instrumentation label to the pull request.

Run system tests | Check system tests success

Run system tests | main / End-to-end #9 / akka-http 9

See error
Test failure in test_blocking_addresses.py:591 - WAF attack assertion failed.

Useful? React with 👍 / 👎

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: 8babbbf | Docs | Datadog PR Page | Give us feedback!}

dougqh requested a review from a team as a code owner May 22, 2026 04:54

dougqh requested review from mhlidd and removed request for a team May 22, 2026 04:54

dd-octo-sts Bot added the tag: ai generated Largely based on code generated by an AI or LLM label May 22, 2026

amarziali approved these changes May 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache hashCode on UTF8BytesString#11444

Cache hashCode on UTF8BytesString#11444
dougqh wants to merge 1 commit into
masterfrom
dougqh/utf8bytesstring-cache-hashcode

dougqh commented May 22, 2026

Uh oh!

dd-octo-sts Bot commented May 22, 2026 •

edited

Loading

Uh oh!

datadog-official Bot commented May 22, 2026 •

edited by datadog-datadog-prod-us1 Bot

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dougqh commented May 22, 2026

What Does This Do

Motivation

Implementation notes

Benchmark

Test plan

Uh oh!

dd-octo-sts Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

datadog-official Bot commented May 22, 2026 • edited by datadog-datadog-prod-us1 Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ Warnings

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dd-octo-sts Bot commented May 22, 2026 •

edited

Loading

datadog-official Bot commented May 22, 2026 •

edited by datadog-datadog-prod-us1 Bot

Loading