feat: JVM metrics via hsperfdata (no hooking)#384
Conversation
- Add nodemon /container/jvm-metrics endpoint backed by hsperfdata - Extract heap used/size/max + GC/safepoint counters - Extract flags from cmdline + JAVA_TOOL_OPTIONS/JDK_JAVA_OPTIONS/JAVA_OPTS with source attribution - Guard permissions behind jvmMetrics.enabled (hostPID + runAsUser=0 + SYS_PTRACE) - Integrate JVM metrics into container resource collector (mirrors GPU nodemon flow) - Add Kind workflow test covering JVM app matrix
|
Update:
Commit: f076b1e} |
Code Review ✅ Approved 3 resolved / 3 findingsImplements JVM metrics scraping via hsperfdata to expose heap, GC, and flag data without requiring JMX or javaagents. Resolved issues include incorrect safepoint time units, cgroup v2 regex path matching, and dead code removal. ✅ 3 resolved✅ Bug: Safepoint times divided by wrong frequency (ns vs ms)
✅ Edge Case: Cgroup v2 bare container ID paths not matched by regex
✅ Quality: Unused isJavaProcess function (dead code)
Was this helpful? React with 👍 / 👎 | Gitar |
Summary
Adds JVM metrics discovery + export in zxporter-nodemon without attach/JMX/javaagent by reading HotSpot hsperfdata via
/proc+ cgroups.What’s included
GET /container/jvm-metrics/proc/<pid>/cmdline/proc/<pid>/environ(JAVA_TOOL_OPTIONS,JDK_JAVA_OPTIONS,JAVA_OPTS)flag_sourcesattribution (cmdline vs which env var)jvmMetrics.enabledhostPID: true,runAsUser: 0, andCAP_SYS_PTRACE/proc/<pid>/root/tmp/hsperfdata_*/*and, in practice,/proc/<pid>/environ(namespace, pod, container)ContainerMetricsSnapshotSmoke test matrix (Kind) — full req/resp bodies
All examples below were captured from a Kind cluster with the nodemon DaemonSet deployed with
jvmMetrics.enabled=true.Common request (fetch all JVM metrics for namespace)
Request:
curl -sS 'http://127.0.0.1:6061/container/jvm-metrics?namespace=jvm-spike'Response (full JSON array):
[ { "node_name": "jvm-spike-control-plane", "pod": "java-sleeper-5d444487b8-mv48f", "namespace": "jvm-spike", "container": "app", "container_id": "2b31ba56f3a6a14491cd381a224c5c02f05fae85aa4c6dae7685edcd090aa880", "pid_host": 2326, "pid_ns": 1, "java_command": "DummyMain", "java_version": "21.0.11", "heap_size_bytes": 134287360, "heap_used_bytes": 0, "heap_max_size_bytes": 268435456, "gc_time_seconds_total": { "Serial full collection pauses": 0.001146047, "Serial young collection pauses": 0.000271418 }, "safepoint_time_seconds_total": 0.001670841, "safepoint_sync_time_seconds_total": 0.000010959, "flags_extracted": { "xms_bytes": 67108864, "xmx_bytes": 268435456, "max_ram_percentage": 70, "use_container_support": true }, "flag_sources": { "xms_bytes": "cmdline", "xmx_bytes": "cmdline", "max_ram_percentage": "cmdline", "use_container_support": "cmdline" }, "raw_cmdline": "java -Xms64m -Xmx256m -XX:MaxRAMPercentage=70 -XX:+UseContainerSupport -cp /tmp DummyMain -Dexample.tool.options=true", "timestamp": "2026-05-25T18:01:06.828335777Z" }, { "node_name": "jvm-spike-control-plane", "pod": "java8-xmx-only-5b69c64479-tf2qx", "namespace": "jvm-spike", "container": "app", "container_id": "6945383b4828b08a0fc77b81d90a4752cbcc85300d776b799677c1a460a0325b", "pid_host": 30009, "pid_ns": 1, "java_command": "DummyMain", "java_version": "1.8.0_492", "heap_size_bytes": 42078208, "heap_used_bytes": 0, "heap_max_size_bytes": 201326592, "gc_time_seconds_total": { "Copy": 0.000289087, "MSC": 0.000437005 }, "safepoint_time_seconds_total": 0.011397506, "safepoint_sync_time_seconds_total": 0.000032208, "flags_extracted": { "xmx_bytes": 201326592 }, "flag_sources": { "xmx_bytes": "cmdline" }, "raw_cmdline": "java -Xmx192m -cp /tmp DummyMain", "timestamp": "2026-05-25T18:01:06.828466529Z" }, { "node_name": "jvm-spike-control-plane", "pod": "java17-maxram-only-7794b6cf4f-gvhjf", "namespace": "jvm-spike", "container": "app", "container_id": "c9fefdee55607a540f6174fab2058e6775d3de5db3da427ce896c7e1f080b1fd", "pid_host": 30077, "pid_ns": 1, "java_command": "DummyMain", "java_version": "17.0.19", "heap_size_bytes": 16912384, "heap_used_bytes": 0, "heap_max_size_bytes": 216006656, "gc_time_seconds_total": { "Serial full collection pauses": 0.000466714, "Serial young collection pauses": 0.000358171 }, "safepoint_time_seconds_total": 0.000893928, "safepoint_sync_time_seconds_total": 0.000013291, "flags_extracted": { "max_ram_percentage": 40, "use_container_support": true }, "flag_sources": { "max_ram_percentage": "cmdline", "use_container_support": "cmdline" }, "raw_cmdline": "java -XX:MaxRAMPercentage=40 -XX:+UseContainerSupport -cp /tmp DummyMain", "timestamp": "2026-05-25T18:01:06.828589447Z" }, { "node_name": "jvm-spike-control-plane", "pod": "java11-tool-options-5df9bfcb98-twtc9", "namespace": "jvm-spike", "container": "app", "container_id": "5d7c501a26cf5f3b440d8a726c45418b8dfca199b6314d3f52c556f5189c4a53", "pid_host": 30153, "pid_ns": 1, "java_command": "DummyMain", "java_version": "11.0.31", "heap_size_bytes": 50331648, "heap_used_bytes": 0, "heap_max_size_bytes": 167772160, "gc_time_seconds_total": { "Copy": 0, "MSC": 0 }, "safepoint_time_seconds_total": 0.000294419, "safepoint_sync_time_seconds_total": 0.000093209, "flags_extracted": { "xms_bytes": 50331648, "xmx_bytes": 167772160, "max_ram_percentage": 65 }, "flag_sources": { "xms_bytes": "JAVA_TOOL_OPTIONS", "xmx_bytes": "JAVA_TOOL_OPTIONS", "max_ram_percentage": "JAVA_TOOL_OPTIONS" }, "raw_cmdline": "java -cp /tmp DummyMain -Xms48m -Xmx160m -XX:MaxRAMPercentage=65 -Dfrom=tooloptions", "timestamp": "2026-05-25T18:01:06.829114328Z" } ]Case A: java8 cmdline heap sizing (
-Xmx192m)Workload:
jvm-spike/deploy/java8-xmx-onlyRequest:
Response (full object):
{ "container": "app", "container_id": "6945383b4828b08a0fc77b81d90a4752cbcc85300d776b799677c1a460a0325b", "flag_sources": { "xmx_bytes": "cmdline" }, "flags_extracted": { "xmx_bytes": 201326592 }, "gc_time_seconds_total": { "Copy": 0.000289087, "MSC": 0.000437005 }, "heap_max_size_bytes": 201326592, "heap_size_bytes": 42078208, "heap_used_bytes": 0, "java_command": "DummyMain", "java_version": "1.8.0_492", "namespace": "jvm-spike", "node_name": "jvm-spike-control-plane", "pid_host": 30009, "pid_ns": 1, "pod": "java8-xmx-only-5b69c64479-tf2qx", "raw_cmdline": "java -Xmx192m -cp /tmp DummyMain", "safepoint_sync_time_seconds_total": 3.2208e-05, "safepoint_time_seconds_total": 0.011397506, "timestamp": "2026-05-25T18:01:06.828466529Z" }Case B: java11 env-injected options (
JAVA_TOOL_OPTIONS)Workload:
jvm-spike/deploy/java11-tool-optionsRequest:
Response (full object):
{ "container": "app", "container_id": "5d7c501a26cf5f3b440d8a726c45418b8dfca199b6314d3f52c556f5189c4a53", "flag_sources": { "max_ram_percentage": "JAVA_TOOL_OPTIONS", "xms_bytes": "JAVA_TOOL_OPTIONS", "xmx_bytes": "JAVA_TOOL_OPTIONS" }, "flags_extracted": { "max_ram_percentage": 65, "xms_bytes": 50331648, "xmx_bytes": 167772160 }, "gc_time_seconds_total": { "Copy": 0, "MSC": 0 }, "heap_max_size_bytes": 167772160, "heap_size_bytes": 50331648, "heap_used_bytes": 0, "java_command": "DummyMain", "java_version": "11.0.31", "namespace": "jvm-spike", "node_name": "jvm-spike-control-plane", "pid_host": 30153, "pid_ns": 1, "pod": "java11-tool-options-5df9bfcb98-twtc9", "raw_cmdline": "java -cp /tmp DummyMain -Xms48m -Xmx160m -XX:MaxRAMPercentage=65 -Dfrom=tooloptions", "safepoint_sync_time_seconds_total": 9.3209e-05, "safepoint_time_seconds_total": 0.000294419, "timestamp": "2026-05-25T18:01:06.829114328Z" }Key assertions visible in response:
flags_extracted.xmx_bytes=167772160(160MiB)flag_sources.xmx_bytes="JAVA_TOOL_OPTIONS"Case C: java17 percentage-based heap sizing (
-XX:MaxRAMPercentage=40)Workload:
jvm-spike/deploy/java17-maxram-onlyRequest:
Response (full object):
{ "container": "app", "container_id": "c9fefdee55607a540f6174fab2058e6775d3de5db3da427ce896c7e1f080b1fd", "flag_sources": { "max_ram_percentage": "cmdline", "use_container_support": "cmdline" }, "flags_extracted": { "max_ram_percentage": 40, "use_container_support": true }, "gc_time_seconds_total": { "Serial full collection pauses": 0.000466714, "Serial young collection pauses": 0.000358171 }, "heap_max_size_bytes": 216006656, "heap_size_bytes": 16912384, "heap_used_bytes": 0, "java_command": "DummyMain", "java_version": "17.0.19", "namespace": "jvm-spike", "node_name": "jvm-spike-control-plane", "pid_host": 30077, "pid_ns": 1, "pod": "java17-maxram-only-7794b6cf4f-gvhjf", "raw_cmdline": "java -XX:MaxRAMPercentage=40 -XX:+UseContainerSupport -cp /tmp DummyMain", "safepoint_sync_time_seconds_total": 1.3291e-05, "safepoint_time_seconds_total": 0.000893928, "timestamp": "2026-05-25T18:01:06.828589447Z" }Key assertions visible in response:
flags_extracted.max_ram_percentage=40flag_sources.max_ram_percentage="cmdline"Case D: baseline java-sleeper (cmdline + percent)
Workload:
jvm-spike/deploy/java-sleeperRequest:
Response (full object):
{ "container": "app", "container_id": "2b31ba56f3a6a14491cd381a224c5c02f05fae85aa4c6dae7685edcd090aa880", "flag_sources": { "max_ram_percentage": "cmdline", "use_container_support": "cmdline", "xms_bytes": "cmdline", "xmx_bytes": "cmdline" }, "flags_extracted": { "max_ram_percentage": 70, "use_container_support": true, "xms_bytes": 67108864, "xmx_bytes": 268435456 }, "gc_time_seconds_total": { "Serial full collection pauses": 0.001146047, "Serial young collection pauses": 0.000271418 }, "heap_max_size_bytes": 268435456, "heap_size_bytes": 134287360, "heap_used_bytes": 0, "java_command": "DummyMain", "java_version": "21.0.11", "namespace": "jvm-spike", "node_name": "jvm-spike-control-plane", "pid_host": 2326, "pid_ns": 1, "pod": "java-sleeper-5d444487b8-mv48f", "raw_cmdline": "java -Xms64m -Xmx256m -XX:MaxRAMPercentage=70 -XX:+UseContainerSupport -cp /tmp DummyMain -Dexample.tool.options=true", "safepoint_sync_time_seconds_total": 1.0959e-05, "safepoint_time_seconds_total": 0.001670841, "timestamp": "2026-05-25T18:01:06.828335777Z" }CI / Integration test coverage
Adds a Kind workflow that deploys the same JVM workload matrix and asserts:
java8-xmx-only:xmx_bytes == 201326592java11-tool-options:xmx/xms/max_ram_percentageextracted andflag_sourcesmentionsJAVA_TOOL_OPTIONSjava17-maxram-only:max_ram_percentage == 40not-java: absent from JVM metrics outputWorkflow:
.github/workflows/jvm-metrics-kind-test.ymlFixture:
test/fixtures/jvm-apps.yamlSecurity/ops notes
jvmMetrics.enabled=falseby default.