Fixes #26555: Fix db_connections_total Prometheus metric always zero#27022
Fixes #26555: Fix db_connections_total Prometheus metric always zero#27022RajdeepKushwaha5 wants to merge 10 commits intoopen-metadata:mainfrom
Conversation
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
There was a problem hiding this comment.
Pull request overview
Fixes the db_connections_total Prometheus metric that previously stayed at 0.0 by replacing an unused Counter with a Gauge that reflects HikariCP pool size (active + idle), aligning the Prometheus view with the Dropwizard database.pool.TotalConnections value.
Changes:
- Replace
db.connections.totalfrom a deadCounterto aGaugecomputed from HikariCP Micrometer gauges (hikaricp.connections.active+hikaricp.connections.idle). - Remove the unused
incrementDatabaseConnections()path. - Add unit tests validating the new gauge behavior with and without HikariCP gauges present.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| openmetadata-service/src/main/java/org/openmetadata/service/monitoring/OpenMetadataMetrics.java | Switches db.connections.total to a Gauge derived from HikariCP pool gauges and removes dead Counter increment code. |
| openmetadata-service/src/test/java/org/openmetadata/service/monitoring/OpenMetadataMetricsTest.java | Adds tests verifying db.connections.total reflects active+idle and returns 0 when Hikari metrics are absent. |
3b7d4be to
58bb55e
Compare
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
| Gauge.builder("db.connections", () -> poolTotalConnections()) | ||
| .description("Total connections in the database connection pool") | ||
| .register(meterRegistry); |
There was a problem hiding this comment.
The PR/issue targets the Prometheus metric db_connections_total, but this change registers a gauge named db.connections instead of keeping db.connections.total. With Micrometer’s Prometheus naming, this will expose db_connections (gauge) and the original db_connections_total metric will disappear/rename, which likely breaks existing dashboards and doesn’t match the issue reproduction/expectation. Consider keeping the original metric name (db.connections.total) and switching only the meter type to a gauge (or publish both names temporarily for backward compatibility).
| @Test | ||
| void dbConnectionsTotalReflectsHikariCPPoolSize() { | ||
| AtomicInteger activeConnections = new AtomicInteger(5); | ||
| AtomicInteger idleConnections = new AtomicInteger(15); | ||
|
|
||
| Gauge.builder("hikaricp.connections.active", activeConnections, AtomicInteger::doubleValue) | ||
| .register(registry); | ||
| Gauge.builder("hikaricp.connections.idle", idleConnections, AtomicInteger::doubleValue) | ||
| .register(registry); | ||
|
|
||
| new OpenMetadataMetrics(registry); | ||
|
|
||
| Gauge total = registry.find("db.connections").gauge(); | ||
| assertNotNull(total, "db.connections gauge should be registered"); | ||
| assertEquals(20.0, total.value(), 0.01, "Should equal active + idle"); |
There was a problem hiding this comment.
These tests validate the computed value, but they don’t assert the Prometheus-exposed metric name/type (the original bug report is about /prometheus output). Adding a PrometheusMeterRegistry-based assertion (scrape contains db_connections_total with TYPE gauge) would catch naming/registry-conversion regressions and ensure the fix matches the reported behavior.
🟡 Playwright Results — all passed (19 flaky)✅ 3954 passed · ❌ 0 failed · 🟡 19 flaky · ⏭️ 86 skipped
🟡 19 flaky test(s) (passed on retry)
How to debug locally# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip # view trace |
… scrape test - Restore gauge name to 'db.connections.total' so Micrometer's Prometheus naming convention exposes it as 'db_connections_total', preserving backward compatibility with existing dashboards. - Add prometheusScrapExposesDbConnectionsTotal() test that uses PrometheusMeterRegistry to verify the Prometheus scrape output contains 'db_connections_total' as a gauge type.
…uffix The new Prometheus Java client (used by PrometheusMeterRegistry in micrometer 1.14.5) strips the _total suffix from metric names because it is reserved for counters. The gauge db.connections.total was being sanitized to db_connections, causing the test to fail when looking for db_connections_total in the scrape output. Renamed to db.pool.connections which exports as db_pool_connections in Prometheus format, avoiding suffix conflicts.
9b62fe2 to
105c956
Compare
Code Review ✅ Approved 2 resolved / 2 findingsFixes the db_connections_total Prometheus metric reporting by standardizing naming and correcting the test method typo. No issues remain. ✅ 2 resolved✅ Quality: Metric name inconsistent with sibling counters
✅ Quality: Typo in test method name: 'Scrap' should be 'Scrape'
OptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
| private double poolTotalConnections() { | ||
| Gauge active = meterRegistry.find("hikaricp.connections.active").gauge(); | ||
| Gauge idle = meterRegistry.find("hikaricp.connections.idle").gauge(); | ||
| double activeVal = active != null ? active.value() : 0.0; | ||
| double idleVal = idle != null ? idle.value() : 0.0; | ||
| return activeVal + idleVal; | ||
| } |
There was a problem hiding this comment.
The gauge callback performs two registry lookups (meterRegistry.find(...).gauge()) on every sample/scrape. To reduce overhead, consider caching the Gauge references once they’re discovered (e.g., initialize lazily on first successful lookup and reuse thereafter), while still supporting late registration of Hikari meters.
|



Describe your changes:
Fixes #26555
The Prometheus metric
db_connections_totalwas always reporting0.0because it was registered as aCounterbut theincrementDatabaseConnections()method was never called anywhere in the codebase — it was dead code.Root cause:
db.connections.totalwas a Counter that no code path ever incremented, while HikariCP'sMicrometerMetricsTrackerFactory(configured inHikariCPDataSourceFactory) already auto-registers working connection pool gauges (hikaricp.connections.active,hikaricp.connections.idle, etc.).Fix: Replace the dead Counter with a Gauge that reads from HikariCP's auto-registered metrics. The gauge computes
active + idleconnections, matching the same value as the Dropwizarddatabase.pool.TotalConnectionsmetric.Changes:
CounterwithGaugebacked by HikariCP pool metrics inOpenMetadataMetricsincrementDatabaseConnections()method andjdbiConnectionCounterfieldpoolTotalConnections()helper that sumshikaricp.connections.active+hikaricp.connections.idleOpenMetadataMetricsTestwith 2 tests covering gauge behavior (with and without HikariCP metrics)RequestLatencyContextTestare a pre-existing issue tracked in Test RequestLatencyContextTest #26953)Type of change:
Checklist:
Fixes <issue-number>: <short explanation>