fix(workload-alloy): correct metric_pods relabel (follow-up #206) by danielgines · Pull Request #40 · Estabilis/estabilis-platform-gitops

danielgines · 2026-05-04T09:09:50Z

Summary

Same metric_pods relabel bug shipped fixed in the hub Alloy via Estabilis/estabilis-platform#148 — applies to the workload Alloy values too. Without this fix, any pod with prometheus.io/scrape=true annotation on a workload cluster has its /metrics endpoint silently unreachable (the rule wrote __address__="<port>:<port>" instead of "<pod_ip>:<port>").

Diff

Replaces the broken rule with the canonical Prometheus pattern (combine __address__ with the annotation port), and adds a companion rule honoring prometheus.io/path when present.

Scope vs hub fix

Bug 1 (relabel): present in workload Alloy → fixed here.
Bug 2 (OTLP receiver routes): workload Alloy doesn't run an OTLP receiver; only the hub does. Not applicable.

Test plan

helm template grafana/alloy --version 1.6.2 -f values/workload/alloy.yaml renders cleanly.
alloy fmt on rendered config: exit 0.
alloy run on rendered config: exit 0; only the expected sys.env() errors when run offline.
After deploy on a workload cluster (none in production yet): pod with annotation has metrics scraped.

Urgency

Lower than the hub fix — no workload clusters in production at the time of writing. Merging keeps the workload Alloy ready for the first cluster that gets provisioned.

Closes Estabilis/estabilis-platform-tools#210

Same regex bug shipped fixed in the hub Alloy via upstream PR #148: the previous rule used only the prometheus.io/port annotation as source label with regex "(.+)" and replacement "${1}:$0", causing both backrefs to resolve to the captured port value. The result was __address__="<port>:<port>" (e.g. "9090:9090") instead of the expected "<pod_ip>:<port>". Any pod relying on annotation-based scrape on a workload cluster never had its /metrics endpoint reached. Replaced with the canonical Prometheus pattern: combine __address__ (pod IP from discovery) with the annotation port, matching "([^:]+)(?::\d+)?;(\d+)" and replacing with "$1:$2". Also added a companion rule honoring prometheus.io/path when present. Workload Alloy does not run an OTLP receiver — only the hub does — so Bug 2 from the original issue does not apply here. Single-edit fix, ~15 lines. Validation: - helm template grafana/alloy --version 1.6.2 -f values/workload/alloy.yaml renders cleanly. - alloy fmt + alloy run on the rendered config exit 0 (only the expected sys.env() errors fire when run offline without the LOKI_PUSH_URL / MIMIR_PUSH_URL / CLUSTER_NAME env vars set). Lower urgency than the hub fix because there are no workload clusters in production at the time of writing; merging keeps the workload alloy ready for the first cluster that gets provisioned. Closes Estabilis/estabilis-platform-tools#210

danielgines merged commit a04496f into main May 4, 2026
5 checks passed

danielgines deleted the fix/210-workload-alloy-metric-pods-relabel branch May 4, 2026 09:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(workload-alloy): correct metric_pods relabel (follow-up #206)#40

fix(workload-alloy): correct metric_pods relabel (follow-up #206)#40
danielgines merged 1 commit intomainfrom
fix/210-workload-alloy-metric-pods-relabel

danielgines commented May 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

danielgines commented May 4, 2026

Summary

Diff

Scope vs hub fix

Test plan

Urgency

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant