fix(e2e): drop hardcoded D2s_v3 system pool from MA35D scenarios#8375
Open
ganeshkumarashok wants to merge 1 commit intomainfrom
Open
fix(e2e): drop hardcoded D2s_v3 system pool from MA35D scenarios#8375ganeshkumarashok wants to merge 1 commit intomainfrom
ganeshkumarashok wants to merge 1 commit intomainfrom
Conversation
Both Test_AzureLinuxV3_MA35D and Test_AzureLinuxV3_MA35D_Scriptless
pinned the AKS system node pool to Standard_D2s_v3 in eastus. That SKU
is currently subscription-restricted across all eastus availability zones
("All availability zones are restricted for this SKU"), so cluster
creation fails with 400 BadRequest before the scenario can exercise
the MA35D GPU node under test.
Removing the override falls back to config.DefaultVMSKU
(Standard_D2ds_v5), which every other GPU/non-GPU scenario already uses
successfully for the system pool. The MA35D GPU SKU itself
(Standard_NM16ads_MA35D) is unchanged.
Observed in build 161380177 on PR #8228:
RESPONSE 400: 400 Bad Request
The VM size of 'Standard_D2s_v3' is currently not available in your
subscription in location 'eastus'.
Contributor
There was a problem hiding this comment.
Pull request overview
Removes a hardcoded Kubernetes system node pool VM SKU from the two AzureLinuxV3 MA35D GPU e2e scenarios so they rely on the standard e2e default (config.Config.DefaultVMSKU), avoiding Standard_D2s_v3 availability/subscription restrictions in eastus while keeping the MA35D GPU SKU under test unchanged.
Changes:
- Dropped
K8sSystemPoolSKU: "Standard_D2s_v3"fromTest_AzureLinuxV3_MA35D. - Dropped
K8sSystemPoolSKU: "Standard_D2s_v3"fromTest_AzureLinuxV3_MA35D_Scriptless. - Left
Location: "eastus"in place for MA35D capacity, allowing system pool SKU to fall back toconfig.Config.DefaultVMSKU.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
K8sSystemPoolSKU: \"Standard_D2s_v3\"fromTest_AzureLinuxV3_MA35DandTest_AzureLinuxV3_MA35D_Scriptlessconfig.DefaultVMSKU(Standard_D2ds_v5) — same SKU every other scenario uses for the AKS system node poolStandard_NM16ads_MA35D) is unchangedBackground
Both MA35D scenarios pin
Location: \"eastus\"(the only region with MA35D capacity) and override the system node pool to the older v3 SKU. The override predates the v5 default and is no longer necessary.The pinned SKU is currently subscription-restricted across all eastus availability zones for the AB e2e subscription (
8ecadfc9-...), so AKS cluster creation 400s before the GPU node under test can even be provisioned. Observed in PR #8228 GPU E2E build 161380177:This caused 5 reported failures in that run (the parent + leaf subtests for both MA35D scenarios), which had nothing to do with the PR being tested.
Why this fix
Standard_D2ds_v5is the established default for all e2e system pools (config/config.go: DefaultVMSKU)D2ds_v5were also restricted in eastus we'd see it across many GPU scenarios, not just MA35D — i.e. failure mode would be loud and broad, not silently misattributed to MA35DTest plan
abe2e-kubenet-v4-*in eastus comes up cleanly with v5 system poolTest_AzureLinuxV3_MA35Dand_Scriptlessreach the GPU validators