Support OTLP runtime metrics with OTel-native naming by link04 · Pull Request #11318 · DataDog/dd-trace-java

link04 · 2026-05-08T00:34:13Z

What Does This Do

Adds an OTLP runtime-metrics path that emits JVM runtime metrics with OTel semantic-convention names (jvm.*) through the agent's MeterProvider, instead of the proprietary DogStatsD names (jvm.heap_memory, jvm.thread_count, …).

When the three flags below are set together, JvmOtlpRuntimeMetrics.start() is invoked from JMXFetch.run() and registers 15 instruments backed by java.lang.management MXBean callbacks. They flow through the existing OTLP exporter — no new transport. To avoid double-reporting, JMXFetch switches to jmxfetch-config-no-jvm-defaults.yaml (which sets collect_default_jvm_metrics: false) instead of the default config when OTLP runtime metrics are enabled.

JvmOtlpRuntimeMetrics lives in the agent-jmxfetch module. Starting it from JMXFetch.run() lets it ride the same delayed-start path as the rest of JMXFetch, avoiding the JMX side-effects that would occur if it were started from Agent.installDatadogTracer().

Flag	Required value	Default
`DD_RUNTIME_METRICS_ENABLED`	`true`	`true`
`DD_METRICS_OTEL_ENABLED`	`true`	`false`
`DD_METRICS_OTEL_EXPORTER`	`otlp`	unset

Instruments registered (15 total — Recommended + Development per the OTel JVM semconv):

Memory — jvm.memory.used, jvm.memory.committed, jvm.memory.limit, jvm.memory.init, jvm.memory.used_after_last_gc
Buffer pools — jvm.buffer.memory.used, jvm.buffer.memory.limit, jvm.buffer.count
Threads — jvm.thread.count
Class loading — jvm.class.loaded, jvm.class.count, jvm.class.unloaded
CPU — jvm.cpu.time, jvm.cpu.count, jvm.cpu.recent_utilization

jvm.gc.duration is intentionally deferred. The spec requires a Histogram of per-collection pause durations, but GarbageCollectorMXBean only exposes cumulative collection time. Populating the histogram requires either subscribing to GarbageCollectionNotificationInfo via JMX (blocked by the bootstrap-class-loading constraints in docs/bootstrap_design_guidelines.md) or consuming JFR GarbageCollection events. Tracked as a follow-up.

Related system tests PR enabling tests: DataDog/system-tests#6800

Motivation

Customers running with DD_METRICS_OTEL_EXPORTER=otlp route their telemetry to an OTel collector — there may not be a Datadog Agent on the path, and therefore nothing listening on the DogStatsD socket. Today the tracer's runtime metrics still emit through DogStatsD with proprietary names (jvm.heap_memory, …), so in those deployments runtime metrics silently go nowhere.

This change emits the same runtime metric data as OTLP instruments with OTel semantic-convention names through the OTel MeterProvider, so it travels the same OTLP pipeline the customer already configured. Customers who haven't opted into OTLP metrics see no change — the existing DogStatsD path is untouched.

Additional Notes

The DogStatsD runtime-metrics path is unmodified. The two paths can run independently; opting into OTLP doesn't disable DogStatsD.
start() is single-shot: an AtomicBoolean CAS guards against re-entry from re-init, and on failure we log and stop (partial registration is worse than a silent retry).
Uses only java.lang.management.* plus com.sun.management.OperatingSystemMXBean for CPU. CPU instruments are skipped at registration time on JVMs where the com.sun bean isn't present. No javax.management.* is touched, keeping the constraints in docs/bootstrap_design_guidelines.md intact.
agent-jmxfetch depends on otel-bootstrap at build time (compile-only). The OTel API is vendor-repackaged into otel-bootstrap at build time, so it won't conflict with anything in the customer app.
OtelRunnableObservable (new, in otel-bootstrap) provides a Runnable-backed OtelObservable for lambda-style registration; it rate-limits exception logging from the callback.
jmxfetch-config-no-jvm-defaults.yaml is registered as a GraalVM native-image resource in ResourcesFeatureInstrumentation so AOT/native-image builds can load it.
Tests: JvmOtlpRuntimeMetricsTest (JUnit 5, opentelemetry-1.47 module) covers instrument surface, attribute keys (jvm.memory.type=heap|non_heap), positive values for live metrics (jvm.memory.used, jvm.thread.count), and idempotency of repeated start() calls.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 953c8710a6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mcculls

The JvmOtlpRuntimeMetrics needs to be moved to under the agent-jmxfetch module. You can then start it from the JMXFetch class, which means you won't need to change anything in the Agent class.

One thought I did have is that we'd still be sending JMXFetch runtime metrics over DogstatsD, and that this is somewhat duplicating the existing metrics managed there. Is there a plan to address this?

BTW another approach would be to write an implementation of AgentStatsdReporter (say AgentOtlpReporter) which sends metrics data from JMXFetch to the OTel API. We could pass into the embedded JMXFetch service when OTLP is enabled for metrics. This would let you send the existing JMXFetch metrics over OTLP and avoid having 2 different pieces of runtime metrics code to maintain.

(Downside is that a couple of the JMXFetch matrics are different to the OTel metrics, but we might need to converge those anyway.)

mhlidd · 2026-05-13T19:43:32Z

One thought I did have is that we'd still be sending JMXFetch runtime metrics over DogstatsD, and that this is somewhat duplicating the existing metrics managed there. Is there a plan to address this?

Agreed that this is suboptimal. I'll pass in a new .yaml file w/ collect_default_jvm_metrics: false to override the default JMX metrics that are emitted by JMXFetch.

BTW another approach would be to write an implementation of AgentStatsdReporter (say AgentOtlpReporter) which sends metrics data from JMXFetch to the OTel API. We could pass into the embedded JMXFetch service when OTLP is enabled for metrics. This would let you send the existing JMXFetch metrics over OTLP and avoid having 2 different pieces of runtime metrics code to maintain.

(Downside is that a couple of the JMXFetch matrics are different to the OTel metrics, but we might need to converge those anyway.)

@mcculls The problem w/ using JMXFetch is that it is structured to handle DogstatsD metric formats, and to emit the OTel runtime metrics over OTLP, we would still need to convert the results JMXFetch returns to what OTLP expects. This means that we would effectively duplicate the OTel runtime metrics information in a .yaml file and the tracer where we convert the JMXFetch results. IMO this is less optimal than just invoking the OTel Metrics API directly.

…ate from depending on otel-shim to otel-bootstrap

Stale

…tion

…-metrics

dd-octo-sts · 2026-05-18T19:21:20Z

/merge

gh-worker-devflow-routing-ef8351 · 2026-05-18T19:21:25Z

View all feedbacks in Devflow UI.

2026-05-18 19:21:25 UTC ℹ️ Start processing command /merge

2026-05-18 19:21:30 UTC ℹ️ MergeQueue: pull request added to the queue

The expected merge time in master is approximately 1h (p90).

2026-05-18 21:22:06 UTC ❌ MergeQueue: The build pipeline has timeout

The merge request has been interrupted because the build 0 took longer than expected. The current limit for the base branch 'master' is 120 minutes.

dd-octo-sts · 2026-05-19T13:49:22Z

/merge

gh-worker-devflow-routing-ef8351 · 2026-05-19T13:49:27Z

View all feedbacks in Devflow UI.

2026-05-19 13:49:27 UTC ℹ️ Start processing command /merge

2026-05-19 13:49:32 UTC ℹ️ MergeQueue: pull request added to the queue

The expected merge time in master is approximately 1h (p90).

2026-05-19 15:49:57 UTC ❌ MergeQueue: The build pipeline has timeout

The merge request has been interrupted because the build 0 took longer than expected. The current limit for the base branch 'master' is 120 minutes.

dd-octo-sts · 2026-05-19T17:05:55Z

/merge

gh-worker-devflow-routing-ef8351 · 2026-05-19T17:06:00Z

View all feedbacks in Devflow UI.

2026-05-19 17:05:59 UTC ℹ️ Start processing command /merge

2026-05-19 17:06:05 UTC ℹ️ MergeQueue: pull request added to the queue

The expected merge time in master is approximately 1h (p90).

2026-05-19 18:23:40 UTC ℹ️ MergeQueue: This merge request was merged

link04 added 3 commits May 7, 2026 19:33

Adding configs and metrics creation

d308063

Adding test file to check metrics are collected

5e8567f

Doing clean up after testing

4a2901f

link04 added type: enhancement Enhancements and improvements comp: metrics Metrics inst: opentelemetry OpenTelemetry instrumentation tag: ai generated Largely based on code generated by an AI or LLM labels May 8, 2026

link04 added 2 commits May 7, 2026 20:59

Merge branch 'master' into maximo/otlp-runtime-metrics

6a04bcc

Merge branch 'master' into maximo/otlp-runtime-metrics

953c871

link04 marked this pull request as ready for review May 9, 2026 11:02

link04 requested review from a team as code owners May 9, 2026 11:02

link04 requested review from mcculls and removed request for a team May 9, 2026 11:02

chatgpt-codex-connector Bot reviewed May 9, 2026

View reviewed changes

Comment thread ...t-otel/otel-shim/src/main/java/datadog/opentelemetry/shim/metrics/JvmOtlpRuntimeMetrics.java Outdated

mcculls reviewed May 11, 2026

View reviewed changes

Comment thread dd-java-agent/agent-bootstrap/src/main/java/datadog/trace/bootstrap/Agent.java Outdated

mcculls reviewed May 11, 2026

View reviewed changes

Comment thread ...t-otel/otel-shim/src/main/java/datadog/opentelemetry/shim/metrics/JvmOtlpRuntimeMetrics.java Outdated

mcculls previously requested changes May 11, 2026

View reviewed changes

mhlidd added 3 commits May 14, 2026 13:06

move JvmOtlpRuntimeMetrics.java to agent-jmxfetch

9759292

prevent JMXFetch from emitting jvm metrics when otlp is enabled; migr…

729a87e

…ate from depending on otel-shim to otel-bootstrap

update JMXFetch to only emit either OTLP or JMX runtime metrics

5a3a4d1

mcculls self-requested a review May 14, 2026 22:21

mhlidd added 5 commits May 14, 2026 23:44

Merge branch 'master' into maximo/otlp-runtime-metrics

8e79c2e

send otlp_jmx_config when otlp runtime metrics enabled

414112d

update test to assert on guarantees instead of dependent on GC collec…

4533351

…tion

Merge branch 'master' into maximo/otlp-runtime-metrics

0f455be

adding exception handling for callback

25a6396

mcculls added 3 commits May 18, 2026 09:35

Merge remote-tracking branch 'origin/master' into maximo/otlp-runtime…

d42101d

…-metrics

Minor fixes to use correct storage for observable counters

c13e51e

Cleanup

4f2d638

mcculls approved these changes May 18, 2026

View reviewed changes

Merge branch 'master' into maximo/otlp-runtime-metrics

972848b

mhlidd added this pull request to the merge queue May 18, 2026

github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 18, 2026

mhlidd added this pull request to the merge queue May 19, 2026

github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 19, 2026

mhlidd added this pull request to the merge queue May 19, 2026

github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 19, 2026

gh-worker-dd-mergequeue-cf854d Bot merged commit 3b1c46e into master May 19, 2026
573 checks passed

gh-worker-dd-mergequeue-cf854d Bot deleted the maximo/otlp-runtime-metrics branch May 19, 2026 18:23

github-actions Bot added this to the 1.63.0 milestone May 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support OTLP runtime metrics with OTel-native naming#11318

Support OTLP runtime metrics with OTel-native naming#11318
gh-worker-dd-mergequeue-cf854d[bot] merged 17 commits into
masterfrom
maximo/otlp-runtime-metrics

link04 commented May 8, 2026 •

edited by mhlidd

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mcculls left a comment

Uh oh!

mhlidd commented May 13, 2026

Uh oh!

dd-octo-sts Bot commented May 18, 2026

Uh oh!

gh-worker-devflow-routing-ef8351 Bot commented May 18, 2026 •

edited

Loading

Uh oh!

Uh oh!

dd-octo-sts Bot commented May 19, 2026

Uh oh!

gh-worker-devflow-routing-ef8351 Bot commented May 19, 2026 •

edited

Loading

Uh oh!

Uh oh!

dd-octo-sts Bot commented May 19, 2026

Uh oh!

gh-worker-devflow-routing-ef8351 Bot commented May 19, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

link04 commented May 8, 2026 • edited by mhlidd Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What Does This Do

Motivation

Additional Notes

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mcculls left a comment

Choose a reason for hiding this comment

Uh oh!

mhlidd commented May 13, 2026

Uh oh!

dd-octo-sts Bot commented May 18, 2026

Uh oh!

gh-worker-devflow-routing-ef8351 Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

dd-octo-sts Bot commented May 19, 2026

Uh oh!

gh-worker-devflow-routing-ef8351 Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

dd-octo-sts Bot commented May 19, 2026

Uh oh!

gh-worker-devflow-routing-ef8351 Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

link04 commented May 8, 2026 •

edited by mhlidd

Loading

gh-worker-devflow-routing-ef8351 Bot commented May 18, 2026 •

edited

Loading

gh-worker-devflow-routing-ef8351 Bot commented May 19, 2026 •

edited

Loading

gh-worker-devflow-routing-ef8351 Bot commented May 19, 2026 •

edited

Loading