Conversation
| } | ||
|
|
||
| func (o *CompletedAddDatasourceOptions) Run(ctx context.Context) error { | ||
| logger := logr.FromContextOrDiscard(ctx) |
There was a problem hiding this comment.
error if not found, not found -> programmer made a mistake with the command creation
| } | ||
|
|
||
| func (o *CompletedAddDatasourceOptions) getValidWorkspaceIDs(ctx context.Context) (map[string]bool, error) { | ||
| logger := logr.FromContextOrDiscard(ctx) |
There was a problem hiding this comment.
you have the logger in the caller, please pass it in (so you get the contextual fields the caller has added to it)
alternatively, create a child context for this func and put logger, so when you take it out, you get those fields
that's the whole goal with contextual logging - add more and more context as you go down the call chain
| logger := logr.FromContextOrDiscard(ctx) | ||
| retryNeeded := true | ||
| validWorkspaceIDs := make(map[string]bool) | ||
| for retryNeeded { |
There was a problem hiding this comment.
nit: s/retryNeeded/allWorkspacesProvisioned/g
when more than one job is adding workspaces using this command (like in e2e), won't we end up waiting for a very long time? can we figure out which workspaces we care about, so we only wait for those? or do we need to wait for all, or our addition in the future will break? explain this in a comment please
There was a problem hiding this comment.
I had a first draft with an exemption for prow clusters (I.e. if cluster is prow AND grafana/workspace is not ready: skip). That way we can continue fast. Usually I'd suspect the Workspace to be ready since it comes in a pipeline step before this one. Grafana however, might be updating all the time cause of other tests, thus this might run sometimes then, but not always. Does that make sense?
This would give us some confidence, given not 100%, but I think it could be good enough.
There was a problem hiding this comment.
What I would like to understand is - why are we waiting? Are we waiting to get a valid view of current workspaces, or are we waiting because trying to add more workspaces while others are not done provisioning will error? If it's the former, we could try to figure out - given the workspace we're being asked to make, which other workspace(s) do we need to wait on (this prow run, this region, etc). If it's the latter, we need to wait for all.
There was a problem hiding this comment.
We are waiting, cause the provided one is provisioned in this test run. If it is not ready, the datasource check in grafana will fail. At least we saw http 503 errors.
There was a problem hiding this comment.
Awesome - can you please put this rationale into a comment for this wait code - and make sure to factor it so we only wait on the one we specifically care about
stevekuznetsov
left a comment
There was a problem hiding this comment.
Moving in a good direction! Nice work.
a39dddb to
efd69ca
Compare
ff18080 to
9b5ac5b
Compare
|
/lgtm |
32ee730 to
9be5fda
Compare
|
/retest-required |
Creates a command that can i.e. run as a cronjob to reconcile datasources to be configured in Grafana. This can be a base for further work, i.e. running it as cron on a cluster or creating an operator out of it.
9be5fda to
baff291
Compare
|
/test integration |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: janboll, raelga, stevekuznetsov The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest |
|
@janboll Is this going to be merged? |
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@janboll: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Add grafanactl modify command for reconciling Azure Monitor Workspace datasources in Grafana. Update base options to support grafana-resource-id flag, upgrade armdashboard to v2 with async polling, and add GrafanaDatasources pipeline action type with schema and test support. Initial work from: Azure/ARO-HCP#3952
Add grafanactl modify command for reconciling Azure Monitor Workspace datasources in Grafana. Update base options to support grafana-resource-id flag, upgrade armdashboard to v2 with async polling, and add GrafanaDatasources pipeline action type with schema and test support. Initial work from: Azure/ARO-HCP#3952
|
superseeded by: |
Add grafanactl modify command for reconciling Azure Monitor Workspace datasources in Grafana. Update base options to support grafana-resource-id flag, upgrade armdashboard to v2 with async polling, and add GrafanaDatasources pipeline action type with schema and test support. Initial work from: Azure/ARO-HCP#3952
https://issues.redhat.com/browse/AROSLSRE-410
What
Replace the shell script-based Grafana datasource management with a new Go CLI tool grafanactl. Updated dev-infrastructure/region-pipeline.yaml to use the new tool, removing the add-grafana-datasource.sh script.
Why
The script lacked proper error handling, retry logic, and validation of Azure resources before attempting operations. This caused deployment failures