Skip to content

Recreate grafana add datasources#3952

Closed
janboll wants to merge 4 commits intomainfrom
recreate-grafana-add-datasources
Closed

Recreate grafana add datasources#3952
janboll wants to merge 4 commits intomainfrom
recreate-grafana-add-datasources

Conversation

@janboll
Copy link
Copy Markdown
Collaborator

@janboll janboll commented Jan 30, 2026

https://issues.redhat.com/browse/AROSLSRE-410

What

Replace the shell script-based Grafana datasource management with a new Go CLI tool grafanactl. Updated dev-infrastructure/region-pipeline.yaml to use the new tool, removing the add-grafana-datasource.sh script.

Why

The script lacked proper error handling, retry logic, and validation of Azure resources before attempting operations. This caused deployment failures

Comment thread tooling/grafanactl/cmd/modify/options.go Outdated
Comment thread tooling/grafanactl/cmd/modify/options.go Outdated
Comment thread tooling/grafanactl/README.md Outdated
Comment thread tooling/grafanactl/README.md Outdated
Comment thread tooling/grafanactl/README.md Outdated
Comment thread tooling/grafanactl/README.md
Comment thread tooling/grafanactl/cmd/modify/cmd.go Outdated
}

func (o *CompletedAddDatasourceOptions) Run(ctx context.Context) error {
logger := logr.FromContextOrDiscard(ctx)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error if not found, not found -> programmer made a mistake with the command creation

Comment thread tooling/grafanactl/cmd/modify/cmd.go Outdated
Comment thread tooling/grafanactl/cmd/modify/cmd.go Outdated
Comment thread tooling/grafanactl/cmd/modify/cmd.go Outdated
Comment thread tooling/grafanactl/cmd/modify/cmd.go Outdated
Comment thread tooling/grafanactl/cmd/modify/cmd.go Outdated
}

func (o *CompletedAddDatasourceOptions) getValidWorkspaceIDs(ctx context.Context) (map[string]bool, error) {
logger := logr.FromContextOrDiscard(ctx)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you have the logger in the caller, please pass it in (so you get the contextual fields the caller has added to it)

alternatively, create a child context for this func and put logger, so when you take it out, you get those fields

that's the whole goal with contextual logging - add more and more context as you go down the call chain

Comment thread tooling/grafanactl/cmd/modify/cmd.go Outdated
logger := logr.FromContextOrDiscard(ctx)
retryNeeded := true
validWorkspaceIDs := make(map[string]bool)
for retryNeeded {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/retryNeeded/allWorkspacesProvisioned/g

when more than one job is adding workspaces using this command (like in e2e), won't we end up waiting for a very long time? can we figure out which workspaces we care about, so we only wait for those? or do we need to wait for all, or our addition in the future will break? explain this in a comment please

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a first draft with an exemption for prow clusters (I.e. if cluster is prow AND grafana/workspace is not ready: skip). That way we can continue fast. Usually I'd suspect the Workspace to be ready since it comes in a pipeline step before this one. Grafana however, might be updating all the time cause of other tests, thus this might run sometimes then, but not always. Does that make sense?
This would give us some confidence, given not 100%, but I think it could be good enough.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I would like to understand is - why are we waiting? Are we waiting to get a valid view of current workspaces, or are we waiting because trying to add more workspaces while others are not done provisioning will error? If it's the former, we could try to figure out - given the workspace we're being asked to make, which other workspace(s) do we need to wait on (this prow run, this region, etc). If it's the latter, we need to wait for all.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are waiting, cause the provided one is provisioned in this test run. If it is not ready, the datasource check in grafana will fail. At least we saw http 503 errors.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome - can you please put this rationale into a comment for this wait code - and make sure to factor it so we only wait on the one we specifically care about

Copy link
Copy Markdown
Contributor

@stevekuznetsov stevekuznetsov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving in a good direction! Nice work.

@janboll janboll force-pushed the recreate-grafana-add-datasources branch from a39dddb to efd69ca Compare January 30, 2026 15:04
Comment thread dev-infrastructure/region-pipeline.yaml Outdated
@janboll janboll force-pushed the recreate-grafana-add-datasources branch from ff18080 to 9b5ac5b Compare February 2, 2026 15:30
@raelga
Copy link
Copy Markdown
Collaborator

raelga commented Feb 8, 2026

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Feb 8, 2026
@janboll janboll force-pushed the recreate-grafana-add-datasources branch 3 times, most recently from 32ee730 to 9be5fda Compare February 19, 2026 15:29
@janboll
Copy link
Copy Markdown
Collaborator Author

janboll commented Feb 20, 2026

/retest-required

Creates a command that can i.e. run as a cronjob to reconcile
datasources to be configured in Grafana. This can be a base for further
work, i.e. running it as cron on a cluster or creating an operator out
of it.
@janboll janboll force-pushed the recreate-grafana-add-datasources branch from 9be5fda to baff291 Compare February 20, 2026 11:34
@stevekuznetsov
Copy link
Copy Markdown
Contributor

/test integration

@stevekuznetsov
Copy link
Copy Markdown
Contributor

/lgtm

@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Feb 20, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: janboll, raelga, stevekuznetsov

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [janboll,raelga,stevekuznetsov]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@raelga
Copy link
Copy Markdown
Collaborator

raelga commented Mar 14, 2026

/retest

@raelga
Copy link
Copy Markdown
Collaborator

raelga commented Mar 15, 2026

@janboll Is this going to be merged?

@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Mar 22, 2026

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Apr 1, 2026

@janboll: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/secrets-validation baff291 link true /test secrets-validation
ci/prow/e2e-parallel baff291 link true /test e2e-parallel
ci/prow/e2e-images baff291 link true /test e2e-images
ci/prow/baseimage-generator-images baff291 link true /test baseimage-generator-images

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

janboll added a commit to Azure/ARO-Tools that referenced this pull request Apr 8, 2026
Add grafanactl modify command for reconciling Azure Monitor Workspace datasources in Grafana. Update base options to support grafana-resource-id flag, upgrade armdashboard to v2 with async polling, and add GrafanaDatasources pipeline action type with schema and test  support. Initial work from: Azure/ARO-HCP#3952
janboll added a commit to Azure/ARO-Tools that referenced this pull request Apr 8, 2026
Add grafanactl modify command for reconciling Azure Monitor Workspace datasources in Grafana. Update base options to support grafana-resource-id flag, upgrade armdashboard to v2 with async polling, and add GrafanaDatasources pipeline action type with schema and test  support. Initial work from: Azure/ARO-HCP#3952
@janboll
Copy link
Copy Markdown
Collaborator Author

janboll commented Apr 10, 2026

superseeded by:
Azure/ARO-Tools#218
#4791

@janboll janboll closed this Apr 10, 2026
@janboll janboll deleted the recreate-grafana-add-datasources branch April 10, 2026 12:19
stevekuznetsov pushed a commit to Azure/ARO-Tools that referenced this pull request Apr 10, 2026
Add grafanactl modify command for reconciling Azure Monitor Workspace datasources in Grafana. Update base options to support grafana-resource-id flag, upgrade armdashboard to v2 with async polling, and add GrafanaDatasources pipeline action type with schema and test  support. Initial work from: Azure/ARO-HCP#3952
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants