Skip to content

monitoring: migrate dashboards to native histograms#5048

Open
simonswine wants to merge 2 commits into
mainfrom
native-histogram-dashboards
Open

monitoring: migrate dashboards to native histograms#5048
simonswine wants to merge 2 commits into
mainfrom
native-histogram-dashboards

Conversation

@simonswine
Copy link
Copy Markdown
Contributor

@simonswine simonswine commented Apr 13, 2026

Summary

Converts all classic Prometheus histogram queries to native histogram equivalents across all monitoring dashboards and Helm chart templates.

  • Remove _bucket suffix and by (le) from histogram_quantile calls
  • Replace rate(metric_count{...}) with histogram_count(rate(metric{...}))
  • Replace rate(metric_sum{...}) with histogram_sum(rate(metric{...}))
  • Replace _sum/_count average ratio with histogram_avg(sum(rate(metric{...})))
  • Update values.yaml tenantQuery default

Dashboards affected: operational, v2-metastore, v2-read-path, v2-write-path

The raw JSON dashboards under operations/monitoring/dashboards/ are generated via make helm/check — only the Helm templates were hand-edited.

Test plan

  • make helm/check passes (already verified locally)
  • Confirm NHCB scraping is enabled before merging (requires scrape_native_histograms: true on the Prometheus scrape config for Pyroscope)

Note

Low Risk
Low risk: this only changes Makefile generation steps for rendered dashboards and does not affect runtime code paths, but it can change generated dashboard artifacts and CI outputs.

Overview
make helm/check now generates monitoring dashboards/rules for native histograms by default, and additionally renders a classic Prometheus histogram dashboard set by templating pyroscope-monitoring with --set dashboards.nativeHistograms=false and exporting to operations/monitoring/dashboards-classic-histogram/.

Reviewed by Cursor Bugbot for commit 82cbf68. Bugbot is set up for automated code reviews on this repo. Configure here.

@marcsanmi
Copy link
Copy Markdown
Contributor

afaiu, after this change users who don't have scrape_native_histograms: true (or NHCB conversion) on their config will see empty panels after upgrading.

Should we document this in some way?

@simonswine
Copy link
Copy Markdown
Contributor Author

afaiu, after this change users who don't have scrape_native_histograms: true (or NHCB conversion) on their config will see empty panels after upgrading.

Should we document this in some way?

I am not a fan of this aspect either, we could have native dashboards an non native dashboard with ifs in the helm. that's also very ugyl, wdyt?

…back

Converts all classic Prometheus histogram queries to native histogram
equivalents, with a Helm flag to switch back to classic queries for
environments that haven't enabled scrape_native_histograms yet.

- Add dashboards.nativeHistograms (default: true) to values.yaml
- Wrap every histogram expr block in {{- if .Values.dashboards.nativeHistograms }}
  / {{- else }} / {{- end }} across all four dashboard templates (107 blocks)
- Transform rules per branch:
  - histogram_sum/count(rate(X)) ↔ rate(X_sum/count)
  - histogram_quantile without _bucket ↔ adds _bucket + by (le, ...)
  - histogram_avg(sum(rate(X))) ↔ sum(rate(X_sum)) / sum(rate(X_count))
- Add tenantQueryClassic to values.yaml for the tenant mapping panel
- Add --output.dashboards.path flag to monitoring-chart-extractor
- make helm/check now generates both:
  - operations/monitoring/dashboards/ (native, default)
  - operations/monitoring/dashboards-classic-histogram/ (classic)
@simonswine simonswine force-pushed the native-histogram-dashboards branch 2 times, most recently from 844230e to d024237 Compare May 15, 2026 12:03
Bump k8s-monitoring subchart from 3.7.2 to 3.8.8 which added the
scrapeNativeHistograms global toggle, and enable it so Alloy scrapes
native histograms from Pyroscope's protobuf endpoint.

The dashboard queries from the previous commit already use native
histogram functions; this unblocks them.
@github-actions
Copy link
Copy Markdown
Contributor

TruffleHog Scan Results

Summary: Found 1 potential secrets (0 verified, 1 unverified)

  • Possible secret (GoogleGeminiAPIKey) at operations/monitoring/helm/pyroscope-monitoring/charts/k8s-monitoring-3.8.8.tgz:1098AIza***QX4U

Review: Check if unverified secrets are false positives.


Ignoring False Positives:
To mark a false positive, add # trufflehog:ignore as an inline comment on the same line as the detected secret:

my_fake_secret = "AKIAIOSFODNN7EXAMPLE"  # trufflehog:ignore

This works for files that support line numbers (most source files). After adding the comment, push your changes and the scan will re-run.

@simonswine
Copy link
Copy Markdown
Contributor Author

  • operations/monitoring/helm/pyroscope-monitoring/charts/k8s-monitoring-3.8.8.tgz

https://github.com/grafana/k8s-monitoring-helm/blob/6da004416be81e67ddedd6199cd3d8fddeb37721/charts/k8s-monitoring-v1/values.yaml#L2584

Not too sure if we can ignore this the same way

@simonswine
Copy link
Copy Markdown
Contributor Author

Okay I have implemented a variant with and without native histograms also our helm chart now supports native histograms since the helm chart update: Bumped k8s-monitoring 3.7.2 → 3.8.8

@marcsanmi please take another look

@simonswine simonswine marked this pull request as ready for review May 15, 2026 15:28
@simonswine simonswine requested a review from a team as a code owner May 15, 2026 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants