fix(e2e): drop hardcoded D2s_v3 system pool from MA35D scenarios by ganeshkumarashok · Pull Request #8375 · Azure/AgentBaker

ganeshkumarashok · 2026-04-22T17:24:27Z

Summary

Remove K8sSystemPoolSKU: \"Standard_D2s_v3\" from Test_AzureLinuxV3_MA35D and Test_AzureLinuxV3_MA35D_Scriptless
Falls back to config.DefaultVMSKU (Standard_D2ds_v5) — same SKU every other scenario uses for the AKS system node pool
The GPU SKU under test (Standard_NM16ads_MA35D) is unchanged

Background

Both MA35D scenarios pin Location: \"eastus\" (the only region with MA35D capacity) and override the system node pool to the older v3 SKU. The override predates the v5 default and is no longer necessary.

The pinned SKU is currently subscription-restricted across all eastus availability zones for the AB e2e subscription (8ecadfc9-...), so AKS cluster creation 400s before the GPU node under test can even be provisioned. Observed in PR #8228 GPU E2E build 161380177:

RESPONSE 400: 400 Bad Request
ERROR CODE: BadRequest
{
  \"code\": \"BadRequest\",
  \"message\": \"The VM size of 'Standard_D2s_v3' is currently not available in your subscription in location 'eastus'. All availability zones are restricted for this SKU. Please try another VM size or deploy to a different location.\"
}

This caused 5 reported failures in that run (the parent + leaf subtests for both MA35D scenarios), which had nothing to do with the PR being tested.

Why this fix

Standard_D2ds_v5 is the established default for all e2e system pools (config/config.go: DefaultVMSKU)
If D2ds_v5 were also restricted in eastus we'd see it across many GPU scenarios, not just MA35D — i.e. failure mode would be loud and broad, not silently misattributed to MA35D
Smallest possible diff (2 lines per test)

Test plan

Agentbaker GPU E2E (the only pipeline that exercises MA35D)
Confirm cluster abe2e-kubenet-v4-* in eastus comes up cleanly with v5 system pool
Confirm Test_AzureLinuxV3_MA35D and _Scriptless reach the GPU validators

Both Test_AzureLinuxV3_MA35D and Test_AzureLinuxV3_MA35D_Scriptless pinned the AKS system node pool to Standard_D2s_v3 in eastus. That SKU is currently subscription-restricted across all eastus availability zones ("All availability zones are restricted for this SKU"), so cluster creation fails with 400 BadRequest before the scenario can exercise the MA35D GPU node under test. Removing the override falls back to config.DefaultVMSKU (Standard_D2ds_v5), which every other GPU/non-GPU scenario already uses successfully for the system pool. The MA35D GPU SKU itself (Standard_NM16ads_MA35D) is unchanged. Observed in build 161380177 on PR #8228: RESPONSE 400: 400 Bad Request The VM size of 'Standard_D2s_v3' is currently not available in your subscription in location 'eastus'.

Copilot

Pull request overview

Removes a hardcoded Kubernetes system node pool VM SKU from the two AzureLinuxV3 MA35D GPU e2e scenarios so they rely on the standard e2e default (config.Config.DefaultVMSKU), avoiding Standard_D2s_v3 availability/subscription restrictions in eastus while keeping the MA35D GPU SKU under test unchanged.

Changes:

Dropped K8sSystemPoolSKU: "Standard_D2s_v3" from Test_AzureLinuxV3_MA35D.
Dropped K8sSystemPoolSKU: "Standard_D2s_v3" from Test_AzureLinuxV3_MA35D_Scriptless.
Left Location: "eastus" in place for MA35D capacity, allowing system pool SKU to fall back to config.Config.DefaultVMSKU.

Copilot AI review requested due to automatic review settings April 22, 2026 17:24

ganeshkumarashok requested review from AbelHu, Devinwong, SriHarsha001, awesomenix, calvin197, cameronmeissner, djsly, junjiezhang1997, lilypan26, mxj220, pdamianov-dev, phealy, r2k1, sulixu, surajssd, timmy-wright and zachary-bailey as code owners April 22, 2026 17:24

ganeshkumarashok temporarily deployed to test April 22, 2026 17:24 — with GitHub Actions Inactive

Copilot started reviewing on behalf of ganeshkumarashok April 22, 2026 17:25 View session

Copilot AI reviewed Apr 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(e2e): drop hardcoded D2s_v3 system pool from MA35D scenarios#8375

fix(e2e): drop hardcoded D2s_v3 system pool from MA35D scenarios#8375
ganeshkumarashok wants to merge 1 commit intomainfrom
e2e-fix-ma35d-system-pool-sku

ganeshkumarashok commented Apr 22, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ganeshkumarashok commented Apr 22, 2026

Summary

Background

Why this fix

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants