Fix: return 404 when attempting invalid listing in dev environment#4862
Fix: return 404 when attempting invalid listing in dev environment#4862
Conversation
enable negative listing test cases initially to observe behaviour in CI https://redhat.atlassian.net/browse/ARO-25089
|
Skipping CI for Draft Pull Request. |
|
/test e2e-parallel |
|
|
/test stage-e2e-parallel |
|
/test e2e-parallel |
cbeed30 to
11dfe21
Compare
|
/test e2e-parallel |
|
/test stage-e2e-parallel |
|
/test e2e-parallel |
|
/test stage-e2e-parallel |
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Fixes dev-environment listing endpoints to return appropriate 404 ARM *CloudError responses (instead of 500s) when the resource group or parent resource does not exist.
Changes:
- Add Azure ARM-backed
ResourceGroupCheckerand wire it intoFrontend. - Introduce new ARM error helpers for resource-group and parent-resource “not found” cases.
- Update cluster / node pool / external auth list handlers to return 404s with correct CloudError payloads.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| test-integration/utils/integrationutils/utils.go | Updates frontend construction to pass the new resourceGroupChecker parameter. |
| test-integration/go.mod | Adds ARM resources SDK dependency used for RG existence checks. |
| internal/api/arm/error.go | Adds new CloudError code + constructors for RG-not-found and parent-not-found. |
| frontend/pkg/frontend/testhelpers.go | Updates test frontend constructor for new parameter. |
| frontend/pkg/frontend/resource_group.go | Adds ResourceGroupChecker + Azure implementation using ARM ResourceGroups client. |
| frontend/pkg/frontend/node_pool.go | Returns 404 CloudErrors for RG-not-found / parent-cluster-not-found during node pool list. |
| frontend/pkg/frontend/frontend_test.go | Updates test frontend instantiations for new parameter. |
| frontend/pkg/frontend/frontend.go | Extends Frontend to hold a ResourceGroupChecker and plumbs it through constructor. |
| frontend/pkg/frontend/external_auth.go | Returns 404 CloudErrors for RG-not-found / parent-cluster-not-found during external auth list. |
| frontend/pkg/frontend/cluster.go | Checks RG existence when cluster list is empty to distinguish RG-not-found from empty list. |
| frontend/cmd/cmd.go | Instantiates DefaultAzureCredential + RG checker and injects it into the frontend. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
In your pull request description you say:
In the dev environment, there is no ARM proxy layer in front of the RP to validate resource groups before requests reach the frontend.
I understand that to mean azureResourceGroupChecker is only applicable to development environments, not to environments where ARM is present because a collection GET request for a non-existent resource group would never even get routed to the RP.
I think this is important to capture in the source code so it doesn't get removed down the road by someone thinking "this can never happen in production, why do we need it?"
Could you add a comment somewhere explaining this? Wherever you think appropriate.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 12 out of 14 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
I believe that would be possible only for the non-existent listings. The case, where a parent resource is non-existent would need to be dropped. That is fine with me if you think the current approach is too much |
If the negative test cases you've added to #4876 are deemed worth keeping, maybe just split them out in such a way that they don't have the Those cases would then be testing more ARM behavior than RP behavior, so maybe it's appropriate to drop them entirely. |
777b8f8 to
da6349e
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
/test e2e-parallel |
| // parseResourceGroupFromPath extracts the subscription ID and resource group | ||
| // name from a URL path like /subscriptions/{subId}/resourceGroups/{rgName}/... | ||
| func parseResourceGroupFromPath(path string) (subscriptionID, resourceGroupName string) { | ||
| parts := strings.Split(strings.ToLower(path), "/") |
There was a problem hiding this comment.
Consider using strings.EqualFold() here. You will be able to drop the ToLower() step as it is case insensitive. For example:
parts := strings.Split(path, "/")
for i, part := range parts {
if strings.EqualFold(part, "subscriptions") && i+1 < len(parts) {
subscriptionID = parts[i+1]
}
if strings.EqualFold(part, "resourcegroups") && i+1 < len(parts) {
resourceGroupName = parts[i+1]
}
}
| _, err = client.Get(req.Raw().Context(), rgName, nil) | ||
| if err != nil { | ||
| var respErr *azcore.ResponseError | ||
| if errors.As(err, &respErr) && respErr.ErrorCode == "ResourceGroupNotFound" { |
There was a problem hiding this comment.
Please replace the literal "ResourceGroupNotFound" with arm.CloudErrorCodeResourceGroupNotFound to be consistent with the rest of the code.
|
In addition to previous comments I think might want to add a unit test for the ARM simulator bits in parseResourceGroupFromPath() and other functions in azure_client_policies.go . Adding it in separate PR would be fine. |
|
@mvacula02: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/test e2e-parallel |
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mgahagan73, mvacula02 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
fixes ARO-25089 and enables ARO-22570, ARO-22571
What
Implement a fix to error handling after invalid listing. Currently, an internal server occurs instead of the expected ResourceNotFound or similar in the dev environment. Investigation shows that frontend's writeError couldn't extract a resource ID from list endpoint URLs.
PR currently only enables the test cases to observe the CI behaviour first. Afterwards a bug fix will be implemented.The fix adds a ResourceGroupChecker that queries Azure for resource group existence. In the dev environment, there is no ARM proxy layer in front of the RP to validate resource groups before requests reach the frontend. When a list handler encounters a "not found" condition (empty results for clusters, or Cosmos 404 for the parent cluster in node pool/external auth cases), it now checks whether the resource group actually exists. If not, it returns ResourceGroupNotFoundError (404). If the resource group exists but the parent cluster doesn't (node pool and external auth listing), it returns ParentResourceNotFoundError (404). These are proper *CloudError values that the error handler (internal/errorutils/error.go) handles directly, avoiding the ResourceIDFromContext fallback that fails on list endpoints (where no single resource ID exists in the context) and causes an internal server error. The behaviour should now be consistent with higher environments.Front end changes will not be implemented, instead The PR adds an armResourceGroupValidationPolicy to the E2E test harness that simulates ARM's resource group validation in development environment. For every request targeting the RP frontend, the policy checks whether the resource group exists in Azure and returns a 404 if it doesn't, the same behavior ARM provides in production. This enables the ARO-22570 and ARO-22571 test cases for listing clusters and node pools in nonexistent resource groups, without any changes to the frontend.
Scenario removed: listing a resource with non-existent parent. Removed as it can't be implemented without significant front end changes.
Note:
tests will be enabled in #4876 as the frontend changes need to populate first#4876 will be closed as the tests can be enabled within the PR as the fix nowWhy
Implement missing error handling and unblock negative listing cases both for clusters and node pools.
Testing
Tests were initially blocked by the bug. Enabling the tests again and having the run succeed should warrant the bug fix to work.
Bug fix was verified through successful listing test runs in personal dev environment
with a locally built frontend.