Add Additional OTel JVM Runtime Metrics and Gate "Developmental" Metrics by mhlidd · Pull Request #11411 · DataDog/dd-trace-java

mhlidd · 2026-05-18T21:43:47Z

What Does This Do

Follow-up to the parent PR for maximo/otlp-runtime-metrics that expands the OTLP JVM runtime metrics surface and gates Development-status metrics behind a new opt-out flag.

New config

dd.metrics.otel.experimental.enabled (default: true) — mirrors OTel's otel.instrumentation.runtime-telemetry.emit-experimental-metrics. When false, only metrics marked Stable in the OTel JVM semantic conventions are emitted; Development-status metrics are suppressed. Settable via either env var:
- DD_METRICS_OTEL_EXPERIMENTAL_ENABLED (Datadog form)
- OTEL_INSTRUMENTATION_RUNTIME_TELEMETRY_EMIT_EXPERIMENTAL_METRICS (OTel-spec form, mapped through OtelEnvironmentConfigSource)
Both env vars are registered in metadata/supported-configurations.json.

Metrics added or reclassified (all under the datadog.jvm.runtime scope, OTel-native names)

Metric	OTel status	When emitted
`jvm.memory.used_after_last_gc`	Stable	Always (moved into the always-on memory group)
`jvm.gc.duration`	Stable	Always. The `jvm.gc.cause` attribute is gated on the experimental flag (the cause attribute is not in OTel's stable attribute set); `jvm.gc.name` and `jvm.gc.action` are always attached.
`jvm.memory.init`	Development	Only when experimental flag is on
`jvm.buffer.memory.used` / `limit` / `count`	Development	Only when experimental flag is on
`jvm.system.cpu.utilization`	Development	Only when experimental flag is on
`jvm.system.cpu.load_1m`	Development	Only when experimental flag is on
`jvm.file_descriptor.count` / `limit`	Development	Only when experimental flag is on, and only on Unix-like JVMs (`UnixOperatingSystemMXBean`)

Value-guard alignment with OTel reference implementation

jvm.memory.limit and jvm.memory.init now skip recording only when getMax() / getInit() returns the documented -1 sentinel (was > 0, which incorrectly also skipped legitimate 0 values).
All other per-metric guards (>= 0, null checks) match the corresponding callbacks in io.opentelemetry.instrumentation.runtimetelemetry.internal.*.

Test coverage

JvmOtlpRuntimeMetricsTest was extended to assert all newly added metric names are registered (with platform-conditional checks for the Unix-only file descriptor metrics) and to cover jvm.gc.duration emission via System.gc().
New JvmOtlpRuntimeMetricsForkedTest runs in an isolated JVM, calls start(false), and verifies that Development-status instruments are absent and that jvm.gc.cause is not attached to jvm.gc.duration data points when experimental metrics are disabled. Forked because JvmOtlpRuntimeMetrics.start(...) is guarded by a process-wide AtomicBoolean and the registry / JMX listeners are JVM-global, so a single JVM cannot exercise both flag values.
Removed a weak startIsIdempotent test that only checked the metric-name Set size — it could not detect duplicate JMX listeners or duplicate observable callbacks under the same instrument name, which are the actual failure modes if the guard were removed.

Misc

Extracted sunOsBean() helper to remove duplicated instanceof OperatingSystemMXBean cast logic between registerCpuMetrics() and the new registerSystemCpuMetrics().
Added debug logs when an MXBean isn't available so it's obvious why a metric didn't show up.

Motivation

The parent PR established the OTLP JVM runtime metrics pipeline but only emitted a subset of the OTel JVM semantic conventions. This follow-up brings the surface in line with what opentelemetry-java-instrumentation's runtime-telemetry library emits, and adds the standard experimental-metrics opt-out so users who want only the Stable subset (smaller cardinality, fewer dashboard surprises) can disable Development metrics without losing the integration entirely.

Aligning the value guards with OTel's reference implementation prevents two real-world divergences:

Without the 0 vs -1 fix, uncapped non-heap pools (where getMax() == 0 on some JVM/version combos) would silently produce no jvm.memory.limit data point — they should publish 0 to indicate "no limit observed."
The experimental gate ensures dashboards built against OTel's stable-only output won't differ between OTel SDK collection and DD-agent collection.

Additional Notes

No change to JMXFetch behavior beyond passing the new flag through JvmOtlpRuntimeMetrics.start(...). The OTLP_JMX_CONFIG-skip path is unchanged.
The OTel-spec env var otel.instrumentation.runtime-telemetry.emit-experimental-metrics is captured in OtelEnvironmentConfigSource so an unmodified OTel-style config picks up the flag automatically.

Contributor Checklist

Format the title according to the contribution guidelines
Assign the type: and (comp: or inst:) labels in addition to any other useful labels
Avoid using close, fix, or any linking keywords when referencing an issue
Use solves instead, and assign the PR milestone to the issue
Update the CODEOWNERS file on source file addition, migration, or deletion
Update public documentation with any new configuration flags or behaviors

Jira ticket: [PROJ-IDENT]

mhlidd · 2026-05-18T21:59:41Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 62d9b50d1d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

datadog-prod-us1-5 · 2026-05-19T18:23:45Z

✨ Fix all issues with BitsAI

⚠️ Warnings

🚦 1 Pipeline job failed

Check pull requests | Check pull requests

🛟 This job is unlikely to succeed on retry. Please review your pipeline configuration.
Label validation failed: Please add at least one type, and one component or instrumentation label to the pull request.

Useful? React with 👍 / 👎

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: 40ef357 | Docs | Datadog PR Page | Give us feedback!}

dd-octo-sts · 2026-05-20T19:53:34Z

Hi! 👋 Thanks for your pull request! 🎉

To help us review it, please make sure to:

Add at least one type, and one component or instrumentation label to the pull request

If you need help, please check our contributing guidelines.

mhlidd · 2026-05-20T20:13:49Z

@codex review

chatgpt-codex-connector · 2026-05-20T20:20:44Z

Codex Review: Didn't find any major issues. Another round soon, please!

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

ValentinZakharov

Could you clarify whether the following differences from the JVM semantic conventions are intentional?

jvm.thread.count seems to be missing the recommended attributes thread.daemon and thread.state (spec
jvm.memory.init is not split by memory pool and seems to be missing the jvm.memory.pool.name attribute (spec)

ValentinZakharov · 2026-05-21T09:55:24Z

+
+  private static void recordGcDuration(
+      OtelMetricStorage storage, GarbageCollectionNotificationInfo info, boolean captureGcCause) {
+    double durationSeconds = info.getGcInfo().getDuration() / 1000d;


We should probably add a null check in case info doesn’t contain GC info

I'd suggesting adding that check in handleNotification before calling recordGcDuration

mcculls · 2026-05-22T15:07:17Z

-        ManagementFactory.getOperatingSystemMXBean();
-    OperatingSystemMXBean osBean =
-        rawOsBean instanceof OperatingSystemMXBean ? (OperatingSystemMXBean) rawOsBean : null;
+    OperatingSystemMXBean osBean = sunOsBean();


maybe findOsBean() ?

mcculls · 2026-05-22T15:08:48Z

+          false,
+          GarbageCollectorMXBean.class.getClassLoader());
+      return true;
+    } catch (ClassNotFoundException e) {


I'd widen this to catch Exception or Throwable

mcculls · 2026-05-22T15:35:00Z

  static final int DEFAULT_METRICS_OTEL_TIMEOUT = 7_500; // ms
  static final int DEFAULT_METRICS_OTEL_CARDINALITY_LIMIT = 2_000;

+  public static final boolean DEFAULT_METRICS_OTEL_EXPERIMENTAL_ENABLED = true;


Default for this in OTel is false - do we want to match that?

https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/instrumentation/runtime-telemetry/README.md

mcculls · 2026-05-22T15:40:38Z

+    }
+
+    java.lang.management.OperatingSystemMXBean stdOsBean =
+        ManagementFactory.getOperatingSystemMXBean();


There's quite a few calls to ManagementFactory.getOperatingSystemMXBean(); here - some use sunOsBean() which returns null if it's not the right type, while other places have their own instanceof checks.

It might actually be more readable and consistent to just get the MBean with ManagementFactory.getOperatingSystemMXBean(); everywhere and check+cast it to the right type. The sunOsBean() helper doesn't really add much IMHO.

mcculls

One question about whether the default should really be true since OTel defaults it to false at the moment: https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/instrumentation/runtime-telemetry/README.md

Also a few cleanup / robustness comments to be addressed before merging - otherwise looks good.

chatgpt-codex-connector Bot reviewed May 18, 2026

View reviewed changes

Comment thread ...utils/src/main/java/datadog/trace/bootstrap/config/provider/OtelEnvironmentConfigSource.java

Base automatically changed from maximo/otlp-runtime-metrics to master May 19, 2026 18:23

mhlidd added 4 commits May 20, 2026 14:18

init

82c7398

update checks to match OTel checks

b2a3ff6

adding jvm.gc.duration

cbdabdd

adding tests for experimental off

de166ab

mhlidd force-pushed the mhlidd/otlp_runtime_metrics_follow_up branch from 90ddfc2 to de166ab Compare May 20, 2026 18:24

removing unnecessary test and adding configs

40ef357

mhlidd changed the title ~~init~~ Add Additional OTel JVM Runtime Metrics and Gate "Developmental" Metrics May 20, 2026

mhlidd marked this pull request as ready for review May 20, 2026 19:53

mhlidd requested review from a team as code owners May 20, 2026 19:53

mhlidd requested review from ValentinZakharov, bric3 and mcculls and removed request for a team May 20, 2026 19:53

mhlidd added type: enhancement Enhancements and improvements inst: opentelemetry OpenTelemetry instrumentation tag: ai generated Largely based on code generated by an AI or LLM labels May 20, 2026

ValentinZakharov reviewed May 21, 2026

View reviewed changes

mcculls reviewed May 22, 2026

View reviewed changes

mcculls approved these changes May 22, 2026

View reviewed changes

Conversation

mhlidd commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What Does This Do

Motivation

Additional Notes

Contributor Checklist

Uh oh!

mhlidd commented May 18, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

datadog-prod-us1-5 Bot commented May 19, 2026 • edited by datadog-datadog-prod-us1 Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ Warnings

Uh oh!

dd-octo-sts Bot commented May 20, 2026

Uh oh!

mhlidd commented May 20, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 20, 2026

Uh oh!

ValentinZakharov left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ValentinZakharov May 21, 2026

Choose a reason for hiding this comment

Uh oh!

mcculls May 22, 2026

Choose a reason for hiding this comment

Uh oh!

mcculls May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mcculls May 22, 2026

Choose a reason for hiding this comment

Uh oh!

mcculls May 22, 2026

Choose a reason for hiding this comment

Uh oh!

mcculls May 22, 2026

Choose a reason for hiding this comment

Uh oh!

mcculls left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mhlidd commented May 18, 2026 •

edited

Loading

datadog-prod-us1-5 Bot commented May 19, 2026 •

edited by datadog-datadog-prod-us1 Bot

Loading

ValentinZakharov left a comment •

edited

Loading

mcculls May 22, 2026 •

edited

Loading