Recreate grafana add datasources by janboll · Pull Request #3952 · Azure/ARO-HCP

janboll · 2026-01-30T12:55:28Z

https://issues.redhat.com/browse/AROSLSRE-410

What

Replace the shell script-based Grafana datasource management with a new Go CLI tool grafanactl. Updated dev-infrastructure/region-pipeline.yaml to use the new tool, removing the add-grafana-datasource.sh script.

Why

The script lacked proper error handling, retry logic, and validation of Azure resources before attempting operations. This caused deployment failures

stevekuznetsov · 2026-01-30T14:33:55Z

+}
+
+func (o *CompletedAddDatasourceOptions) Run(ctx context.Context) error {
+	logger := logr.FromContextOrDiscard(ctx)


error if not found, not found -> programmer made a mistake with the command creation

stevekuznetsov · 2026-01-30T14:56:03Z

+}
+
+func (o *CompletedAddDatasourceOptions) getValidWorkspaceIDs(ctx context.Context) (map[string]bool, error) {
+	logger := logr.FromContextOrDiscard(ctx)


you have the logger in the caller, please pass it in (so you get the contextual fields the caller has added to it)

alternatively, create a child context for this func and put logger, so when you take it out, you get those fields

that's the whole goal with contextual logging - add more and more context as you go down the call chain

stevekuznetsov · 2026-01-30T14:58:55Z

+	logger := logr.FromContextOrDiscard(ctx)
+	retryNeeded := true
+	validWorkspaceIDs := make(map[string]bool)
+	for retryNeeded {


nit: s/retryNeeded/allWorkspacesProvisioned/g

when more than one job is adding workspaces using this command (like in e2e), won't we end up waiting for a very long time? can we figure out which workspaces we care about, so we only wait for those? or do we need to wait for all, or our addition in the future will break? explain this in a comment please

I had a first draft with an exemption for prow clusters (I.e. if cluster is prow AND grafana/workspace is not ready: skip). That way we can continue fast. Usually I'd suspect the Workspace to be ready since it comes in a pipeline step before this one. Grafana however, might be updating all the time cause of other tests, thus this might run sometimes then, but not always. Does that make sense?
This would give us some confidence, given not 100%, but I think it could be good enough.

What I would like to understand is - why are we waiting? Are we waiting to get a valid view of current workspaces, or are we waiting because trying to add more workspaces while others are not done provisioning will error? If it's the former, we could try to figure out - given the workspace we're being asked to make, which other workspace(s) do we need to wait on (this prow run, this region, etc). If it's the latter, we need to wait for all.

We are waiting, cause the provided one is provisioned in this test run. If it is not ready, the datasource check in grafana will fail. At least we saw http 503 errors.

Awesome - can you please put this rationale into a comment for this wait code - and make sure to factor it so we only wait on the one we specifically care about

stevekuznetsov

Moving in a good direction! Nice work.

raelga · 2026-02-08T06:45:10Z

/lgtm

janboll · 2026-02-20T08:16:56Z

/retest-required

Creates a command that can i.e. run as a cronjob to reconcile datasources to be configured in Grafana. This can be a base for further work, i.e. running it as cron on a cluster or creating an operator out of it.

stevekuznetsov · 2026-02-20T18:23:02Z

/test integration

stevekuznetsov · 2026-02-20T18:23:10Z

/lgtm

openshift-ci · 2026-02-20T18:23:18Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: janboll, raelga, stevekuznetsov

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [janboll,raelga,stevekuznetsov]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

raelga · 2026-03-14T22:22:55Z

/retest

raelga · 2026-03-15T14:47:58Z

@janboll Is this going to be merged?

openshift-ci · 2026-03-22T22:31:51Z

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-ci · 2026-04-01T18:18:10Z

@janboll: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/secrets-validation	`baff291`	link	true	`/test secrets-validation`
ci/prow/e2e-parallel	`baff291`	link	true	`/test e2e-parallel`
ci/prow/e2e-images	`baff291`	link	true	`/test e2e-images`
ci/prow/baseimage-generator-images	`baff291`	link	true	`/test baseimage-generator-images`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Add grafanactl modify command for reconciling Azure Monitor Workspace datasources in Grafana. Update base options to support grafana-resource-id flag, upgrade armdashboard to v2 with async polling, and add GrafanaDatasources pipeline action type with schema and test support. Initial work from: Azure/ARO-HCP#3952

janboll · 2026-04-10T12:19:47Z

superseeded by:
Azure/ARO-Tools#218
#4791

Add grafanactl modify command for reconciling Azure Monitor Workspace datasources in Grafana. Update base options to support grafana-resource-id flag, upgrade armdashboard to v2 with async polling, and add GrafanaDatasources pipeline action type with schema and test support. Initial work from: Azure/ARO-HCP#3952

openshift-ci bot requested review from geoberle and stevekuznetsov January 30, 2026 12:55

openshift-ci bot added the approved label Jan 30, 2026