Skip to content

Fixes #26555: Fix db_connections_total Prometheus metric always zero#27022

Open
RajdeepKushwaha5 wants to merge 10 commits intoopen-metadata:mainfrom
RajdeepKushwaha5:fix/26555-prometheus-db-connections-metric
Open

Fixes #26555: Fix db_connections_total Prometheus metric always zero#27022
RajdeepKushwaha5 wants to merge 10 commits intoopen-metadata:mainfrom
RajdeepKushwaha5:fix/26555-prometheus-db-connections-metric

Conversation

@RajdeepKushwaha5
Copy link
Copy Markdown
Contributor

Describe your changes:

Fixes #26555

The Prometheus metric db_connections_total was always reporting 0.0 because it was registered as a Counter but the incrementDatabaseConnections() method was never called anywhere in the codebase — it was dead code.

Root cause: db.connections.total was a Counter that no code path ever incremented, while HikariCP's MicrometerMetricsTrackerFactory (configured in HikariCPDataSourceFactory) already auto-registers working connection pool gauges (hikaricp.connections.active, hikaricp.connections.idle, etc.).

Fix: Replace the dead Counter with a Gauge that reads from HikariCP's auto-registered metrics. The gauge computes active + idle connections, matching the same value as the Dropwizard database.pool.TotalConnections metric.

Changes:

  • Replace Counter with Gauge backed by HikariCP pool metrics in OpenMetadataMetrics
  • Remove unused incrementDatabaseConnections() method and jdbiConnectionCounter field
  • Add poolTotalConnections() helper that sums hikaricp.connections.active + hikaricp.connections.idle
  • Add OpenMetadataMetricsTest with 2 tests covering gauge behavior (with and without HikariCP metrics)
  • All existing monitoring tests pass (the 5 failures in RequestLatencyContextTest are a pre-existing issue tracked in Test RequestLatencyContextTest #26953)

Type of change:

  • Bug fix

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • I have added a test that covers the exact scenario we are fixing.

Copilot AI review requested due to automatic review settings April 3, 2026 11:08
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 3, 2026

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes the db_connections_total Prometheus metric that previously stayed at 0.0 by replacing an unused Counter with a Gauge that reflects HikariCP pool size (active + idle), aligning the Prometheus view with the Dropwizard database.pool.TotalConnections value.

Changes:

  • Replace db.connections.total from a dead Counter to a Gauge computed from HikariCP Micrometer gauges (hikaricp.connections.active + hikaricp.connections.idle).
  • Remove the unused incrementDatabaseConnections() path.
  • Add unit tests validating the new gauge behavior with and without HikariCP gauges present.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
openmetadata-service/src/main/java/org/openmetadata/service/monitoring/OpenMetadataMetrics.java Switches db.connections.total to a Gauge derived from HikariCP pool gauges and removes dead Counter increment code.
openmetadata-service/src/test/java/org/openmetadata/service/monitoring/OpenMetadataMetricsTest.java Adds tests verifying db.connections.total reflects active+idle and returns 0 when Hikari metrics are absent.

@RajdeepKushwaha5 RajdeepKushwaha5 force-pushed the fix/26555-prometheus-db-connections-metric branch from 3b7d4be to 58bb55e Compare April 3, 2026 11:21
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 3, 2026

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

harshach
harshach previously approved these changes Apr 3, 2026
Copilot AI review requested due to automatic review settings April 3, 2026 15:32
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Comment on lines +53 to +55
Gauge.builder("db.connections", () -> poolTotalConnections())
.description("Total connections in the database connection pool")
.register(meterRegistry);
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR/issue targets the Prometheus metric db_connections_total, but this change registers a gauge named db.connections instead of keeping db.connections.total. With Micrometer’s Prometheus naming, this will expose db_connections (gauge) and the original db_connections_total metric will disappear/rename, which likely breaks existing dashboards and doesn’t match the issue reproduction/expectation. Consider keeping the original metric name (db.connections.total) and switching only the meter type to a gauge (or publish both names temporarily for backward compatibility).

Copilot uses AI. Check for mistakes.
Comment on lines +21 to +35
@Test
void dbConnectionsTotalReflectsHikariCPPoolSize() {
AtomicInteger activeConnections = new AtomicInteger(5);
AtomicInteger idleConnections = new AtomicInteger(15);

Gauge.builder("hikaricp.connections.active", activeConnections, AtomicInteger::doubleValue)
.register(registry);
Gauge.builder("hikaricp.connections.idle", idleConnections, AtomicInteger::doubleValue)
.register(registry);

new OpenMetadataMetrics(registry);

Gauge total = registry.find("db.connections").gauge();
assertNotNull(total, "db.connections gauge should be registered");
assertEquals(20.0, total.value(), 0.01, "Should equal active + idle");
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests validate the computed value, but they don’t assert the Prometheus-exposed metric name/type (the original bug report is about /prometheus output). Adding a PrometheusMeterRegistry-based assertion (scrape contains db_connections_total with TYPE gauge) would catch naming/registry-conversion regressions and ensure the fix matches the reported behavior.

Copilot uses AI. Check for mistakes.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 3, 2026

🟡 Playwright Results — all passed (19 flaky)

✅ 3954 passed · ❌ 0 failed · 🟡 19 flaky · ⏭️ 86 skipped

Shard Passed Failed Flaky Skipped
🟡 Shard 1 296 0 3 4
🟡 Shard 2 752 0 7 8
🟡 Shard 3 729 0 3 7
🟡 Shard 4 755 0 4 18
✅ Shard 5 687 0 0 41
🟡 Shard 6 735 0 2 8
🟡 19 flaky test(s) (passed on retry)
  • Features/CustomizeDetailPage.spec.ts › Domain - customization should work (shard 1, 1 retry)
  • Pages/AuditLogs.spec.ts › should apply both User and EntityType filters simultaneously (shard 1, 1 retry)
  • Pages/UserCreationWithPersona.spec.ts › Create user with persona and verify on profile (shard 1, 1 retry)
  • Features/ActivityAPI.spec.ts › Activity event is created when description is updated (shard 2, 1 retry)
  • Features/ActivityAPI.spec.ts › Activity event is created when owner is added (shard 2, 1 retry)
  • Features/ColumnBulkOperations.spec.ts › should discard changes when closing drawer without saving (shard 2, 1 retry)
  • Features/CustomMetric.spec.ts › Table custom metric (shard 2, 1 retry)
  • Features/DomainFilterQueryFilter.spec.ts › Quick filters should persist when domain filter is applied and cleared (shard 2, 1 retry)
  • Features/ExploreQuickFilters.spec.ts › tier with assigned asset appears in dropdown, tier without asset does not (shard 2, 1 retry)
  • Features/IncidentManager.spec.ts › Complete Incident lifecycle with table owner (shard 2, 1 retry)
  • Features/RTL.spec.ts › Verify Following widget functionality (shard 3, 1 retry)
  • Features/Table.spec.ts › Tags term should be consistent for search (shard 3, 1 retry)
  • Features/UserProfileOnlineStatus.spec.ts › Should show online status badge on user profile for active users (shard 3, 1 retry)
  • Pages/DataContractInheritance.spec.ts › Run Validation - Inherited contract validation uses entity-based validation (shard 4, 1 retry)
  • Pages/DataContractInheritance.spec.ts › Remove Asset - Inherited contract no longer shown when asset is removed from Data Product (shard 4, 1 retry)
  • Pages/DataContracts.spec.ts › Create Data Contract and validate for MlModel (shard 4, 1 retry)
  • Pages/Domains.spec.ts › Domain Rbac (shard 4, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
  • Pages/Lineage/LineageRightPanel.spec.ts › Verify custom properties tab IS visible for supported type: searchIndex (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

… scrape test

- Restore gauge name to 'db.connections.total' so Micrometer's Prometheus
  naming convention exposes it as 'db_connections_total', preserving
  backward compatibility with existing dashboards.
- Add prometheusScrapExposesDbConnectionsTotal() test that uses
  PrometheusMeterRegistry to verify the Prometheus scrape output contains
  'db_connections_total' as a gauge type.
…uffix

The new Prometheus Java client (used by PrometheusMeterRegistry in
micrometer 1.14.5) strips the _total suffix from metric names because
it is reserved for counters. The gauge db.connections.total was being
sanitized to db_connections, causing the test to fail when looking for
db_connections_total in the scrape output.

Renamed to db.pool.connections which exports as db_pool_connections
in Prometheus format, avoiding suffix conflicts.
Copilot AI review requested due to automatic review settings April 17, 2026 21:22
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented Apr 26, 2026

Code Review ✅ Approved 2 resolved / 2 findings

Fixes the db_connections_total Prometheus metric reporting by standardizing naming and correcting the test method typo. No issues remain.

✅ 2 resolved
Quality: Metric name inconsistent with sibling counters

📄 openmetadata-service/src/main/java/org/openmetadata/service/monitoring/OpenMetadataMetrics.java:53
The new gauge is named db.connections while sibling metrics use the .total suffix convention (e.g., db.errors.total, db.connections.total previously). Since the old metric was always zero and nobody could have been relying on it, this isn't a breaking change—but aligning with the existing naming pattern (db.connections.total) would keep the metric namespace consistent and make discovery more predictable for operators.

This is purely cosmetic; feel free to ignore if you prefer the shorter name.

Quality: Typo in test method name: 'Scrap' should be 'Scrape'

📄 openmetadata-service/src/test/java/org/openmetadata/service/monitoring/OpenMetadataMetricsTest.java:59
The test method prometheusScrapExposesDbConnectionsTotal has a typo — 'Scrap' should be 'Scrape'. Minor readability issue.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Comment on lines +207 to +213
private double poolTotalConnections() {
Gauge active = meterRegistry.find("hikaricp.connections.active").gauge();
Gauge idle = meterRegistry.find("hikaricp.connections.idle").gauge();
double activeVal = active != null ? active.value() : 0.0;
double idleVal = idle != null ? idle.value() : 0.0;
return activeVal + idleVal;
}
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The gauge callback performs two registry lookups (meterRegistry.find(...).gauge()) on every sample/scrape. To reduce overhead, consider caching the Gauge references once they’re discovered (e.g., initialize lazily on first successful lookup and reuse thereafter), while still supporting late registration of Hikari meters.

Copilot uses AI. Check for mistakes.
@sonarqubecloud
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

prometheus metric db_connections_total not updated / always zero

4 participants