VPN P2S Ingress: Phases 1-4 (bicep + orchestrator + scaffolder + docs)#53
Merged
Conversation
Adds the additive, optional Azure VPN Gateway P2S ingress at the bicep layer (orchestrator/env/scaffolder/docs land in later phases). - global-infra: expose frontDoorId output from frontdoor-profile.bicep and main.bicep so per-stamp AppGw WAF rules can allow-list AFD via X-Azure-FDID (auto-aliased to FRONT_DOOR_ID through deploy-bicep.mjs). - base-infra vnet.bicep: optional GatewaySubnet (/27, conditional on vpnGatewayEnabled). Output gatewaySubnetId is empty string when disabled to keep shape stable. - base-infra vpn-gateway.bicep (NEW): VPN Gateway (RouteBased, OpenVPN, AAD-auth) + Standard PIP + diagnostic settings. SKU @Allowed list excludes Basic. Uses subscription().tenantId (no stamp-level tenant param exists today). AAD endpoint URLs suppressed for no-hardcoded-env-urls (sovereign clouds require audience + endpoints migrated as a tuple). - base-infra private-dns-portal.bicep (NEW): managed-only private DNS zone + VNet link + A record for portal hostname over the tunnel. Conditional; empty outputs when disabled. Cross-references portal/bicep/main.bicep:239-261 (not reused — different lifecycle). - base-infra application-gateway.bicep: new params appgwWafCustomRules / vpnGatewayEnabled / vpnClientAddressPool / frontDoorId. Builds auto-seeded WAF custom rules AllowAfd(90) / AllowVpn(91) / BlockOther(92) when vpnGatewayEnabled; concats operator rules (>= prio 100). customRules attached directly to AppGw WAF policy (schema is array, not { rules: [] }). - base-infra main.bicep: new params with safe defaults (vpnGatewayEnabled=false, vpnGatewaySku=VpnGw1, vpnClientAddressPool=172.16.200.0/24, vpnAadAudience=public-cloud Azure VPN app GUID, appgwWafCustomRules=[], frontDoorId=''). Threads them into VNet/AppGateway, conditionally instantiates VpnGateway (after LogAnalytics) and PortalPrivateDns (after AppGw, edgeMode=='afd' gated). New outputs vpnGatewayId/vpnGatewayPublicIp/vpnClientAddressPool/vpnPrivateDnsZoneId. - base-infra params template: add VPN_GATEWAY_ENABLED, VPN_GATEWAY_SKU, VPN_CLIENT_ADDRESS_POOL, VPN_AAD_AUDIENCE, FRONT_DOOR_ID placeholders (appgwWafCustomRules NOT placed here — Phase 2 wires it via --parameters @file like AFD). Deviation from plan: plan called for tenantId to be 'an existing param already threaded by the orchestrator' on base-infra main.bicep. No such param exists today (submodules each default to subscription().tenantId). Followed that pattern: VPN module receives subscription().tenantId from main.bicep. Phase 2 can promote to an explicit AZURE_TENANT_ID-bound param if needed. Verification: - az bicep build (base-infra/main, global-infra/main, vpn-gateway, private-dns-portal): exit 0, no warnings on touched files (pre-existing postgres.bicep secure-input warning unchanged). - npm run test:deploy-scripts: 187 pass / 0 fail (regression baseline preserved).
… tenantId fix
- overlay-contracts.mjs: validateVpnGatewayCombo() (afd/akv/suffix/pool-overlap)
wired into validateRequiredEnv(); IPv4 CIDR overlap helper (pure JS).
- template.env: VPN Gateway P2S section (AKV-only constraint, ~\/mo,
45+ min, CA policy reminder on app c632b3df-...) + APPGW_WAF_CUSTOM_RULES_FILE
block mirroring the AFD WAF_CUSTOM_RULES_FILE shape.
- base-infra/bicep/main.bicep: new 'tenantId' param threaded from
AZURE_TENANT_ID via the params template; VpnGateway module now uses
this explicit param instead of subscription().tenantId (Spec FR-001/FR-008).
keyvault/postgres modules keep their existing defaults — fix scoped to
the VPN gateway path only (carry-forward Option A from Phase 1 impl-review).
- deploy-bicep.mjs: APPGW_WAF_CUSTOM_RULES_FILE pass-through for base-infra
mirroring the AFD WAF_CUSTOM_RULES_FILE machinery (gitignored location,
same error shape on missing file).
- deploy.mjs printHelp: VPN time/cost note (45+ min, ~\/mo, AFD coexistence).
- lib/appgw-waf-rules.mjs (NEW): buildAutoSeedRules + buildMergedCustomRules
+ resolveAppgwWafCustomRulesFile — JS shadow of the bicep autoSeedRules /
concat var. Kept in lockstep with application-gateway.bicep by code review;
guarded by a bicep-shape parity test.
- Tests:
* overlay-contracts.test.mjs: 5 rejection + valid + disabled + malformed
+ VNET_CIDR override + aggregation cases for validateVpnGatewayCombo.
* appgw-waf-rules.test.mjs (NEW): four merged-shape cases
(VPN on/off x ops empty/non-empty) + AllowAfd/AllowVpn/BlockOther
structural assertions + file-resolution (relative/absolute/missing/
malformed/non-array) + frontDoorId aliasFor() → FRONT_DOOR_ID +
bicep parity guard.
- package.json: register appgw-waf-rules.test.mjs in test:deploy-scripts.
…IMPROVE-1)
Addresses Phase 2 impl-review findings:
SHOULD-FIX-1: validateRequiredEnv now returns { missing, combo } — VPN
combo errors travel in a separate channel from missing[] so deploy.mjs
and new-env.mjs render them with named-error messages and remediation
hints that point at the env file / docs, not at the scaffolder (which
would clobber operator edits and is the wrong fix).
SHOULD-FIX-2: bicep<->JS parity guard now asserts literal tokens
(X-Azure-FDID, frontDoorId, IPMatch, RemoteAddr, vpnClientAddressPool,
0.0.0.0/0, action values) per-rule, sliced by rule-name boundaries.
Prevents a divergent edit on either side from sneaking past the test.
IMPROVE-1: new vpn-requires-tenant-id combo error closes the gap
behind bicep's 'param tenantId string = ''' default — a blanked
AZURE_TENANT_ID with VPN enabled now fails closed pre-deploy.
Tests: 218 pass (was 215), 3 added.
Adds two new INPUTS to deploy/scripts/new-env.mjs (--vpn-enabled, --vpn-client-address-pool) using the existing declarative pattern. The yes/no VPN prompt is asked after edge/tls; on yes the scaffolder hard-refuses incompatible combos with a named [vpn-incompatible-combo] error before any partial .env is written, and validates VPN_CLIENT_ADDRESS_POOL against the stamp VNet CIDR by reusing Phase 2's validateVpnGatewayCombo (no duplicated overlap arithmetic). VPN_GATEWAY_SKU and VPN_AAD_AUDIENCE flow through unchanged from template.env defaults (VpnGw1, c632b3df-...). Dropped-from-spec keys (VPN_AAD_TENANT_ID, VPN_PRIVATE_DNS_MODE, PRIVATE_DNS_ZONE_ID, VPN_AAD_USERS_GROUP_NAME_HINT) are never emitted. On VPN=yes a clearly-delimited post-scaffold reminder block surfaces the CA-policy requirement (target app c632b3df-..., named users group, MFA, do-not-require-compliance), legacy 41b23e61-... audience override, ~/mo cost, 45+ min first-deploy time, and a pointer to the docs/deploying-to-aks.md VPN section (Phase 4 will write the section body). Tests: 9 new cases in deploy/scripts/test/new-env.test.mjs covering happy path, both refusal combos, custom + overlapping pool, reminder content, and the VPN=no regression. npm run test:deploy-scripts: 227 passing.
Addresses Phase 3 impl-review findings (review FAIL): BLOCKING: --vpn-enabled yes/true silently scaffolded VPN_GATEWAY_ENABLED=false because deriveTargets() only honoured literal 'y'/boolean true while main() normalised via normaliseYesNo(). Route both vpnEnabled and foundryEnabled (audit) through normaliseYesNo() so any truthy value (y, yes, true, boolean true) consistently enables the feature. SHOULD-FIX 1: VPN reminder block now includes Azure-portal client-profile download path (Resource group -> gateway -> Point-to-site configuration -> Download VPN client) plus an 'az network vnet-gateway vpn-client generate-url' CLI alternative, matching the plan requirement. SHOULD-FIX 2: vpnEnabled / vpnClientAddressPool INPUTS entries moved to immediately after tlsSource. Combo refusal (assertVpnGatewayCombo) and pool-overlap re-check now fire inline in gatherInputs() right after each value resolves, so the operator hits the gate before any secret prompts. main()'s late check is preserved as defense-in-depth for non-interactive paths. SHOULD-FIX 3: Removed 'akv-selfsigned' from VPN-compatible help text, menu choice description, and [vpn-incompatible-combo] error message. Spec mandates TLS_SOURCE=akv only; the validator already rejected akv-selfsigned correctly. Tests: added regression coverage for all four findings (CLI yes/true, programmatic boolean, INPUTS prompt-order placement, download-path reminder text, akv-only error wording, foundryEnabled audit). 236/236 deploy-script tests pass.
Folds VPN P2S ingress documentation into the existing operator surfaces - the standalone pilotswarm-vpn-setup skill was dropped in spec. - .github/skills/pilotswarm-new-env-deploy/SKILL.md: add VPN row to topology table; expand EDGE_MODE x TLS_SOURCE matrix to include VPN_GATEWAY_ENABLED with rejected combos and named diagnostic codes; add 'Optional: VPN Gateway P2S' subsection covering constraints, cost/time (~$140/mo, 45+ min), AAD audience default + legacy override, auto-seeded WAF guards (90/91/92), 'Setting up Conditional Access' (target app, named users group, MFA, no device compliance), and 'Distributing the VPN client profile' (portal + az CLI). - .github/skills/pilotswarm-aks-deploy/SKILL.md: add Core Learning noting VPN is a GitOps-IaC-path feature, with the 45+ min first-deploy and ~$140/mo cost disclosure plus cross-references to pilotswarm-new-env-deploy and docs/deploying-to-aks.md. - docs/deploying-to-aks.md: add AFD+VPN row to the topology matrix; new 'Optional: VPN Gateway P2S' section covering architecture, preconditions, env vars, CA policy, client-profile distribution, the auto-seeded 90/91/92 WAF guards, and the general-purpose APPGW_WAF_CUSTOM_RULES_FILE operator hook (parallel to the AFD-side WAF_CUSTOM_RULES_FILE). Docs-only - no bicep, orchestrator, or test changes. npm run test:deploy-scripts still 236/236.
Addresses paw-impl-review findings on the Phase 4 docs commit: - SHOULD-FIX: docs/deploying-to-aks.md topology intro now describes the EDGE_MODE x TLS_SOURCE x VPN_GATEWAY_ENABLED axes with five supported rows (was 'four combinations / two blocked' for the pre-VPN matrix). - SHOULD-FIX: AKV-only constraint wording in both docs/deploying-to-aks.md and .github/skills/pilotswarm-new-env-deploy/SKILL.md now distinguishes the validator (vpn-requires-akv accepts akv + akv-selfsigned) from deploy.mjs UNSUPPORTED_COMBOS (additionally rejects afd+akv-selfsigned). The only effective AFD+VPN combo remains TLS_SOURCE=akv; the source of each diagnostic is now explicit. No code or test changes. npm run test:deploy-scripts still 236/236.
Final-review (PR #53, PASS-WITH-NOTES) follow-ups: * C-1 (CONSENSUS): VPN combo-error hints in overlay-contracts.mjs pointed to `deploy/docs/vpn-p2s.md` which does not exist. Repoint all six hints (five combo codes + generic fallback) at the canonical `docs/deploying-to-aks.md` -- the same doc new-env.mjs already references on the VPN summary. Add a regression-guard test in overlay-contracts.test.mjs that scans the source for `deploy/docs/`. * S-2: `resolveAppgwWafCustomRulesFile()` (parses + validates JSON array shape, fail-closed with named error) was dead from the deploy path -- deploy-bicep.mjs only did `existsSync` before handing the file to az. Wire the helper into the APPGW_WAF_CUSTOM_RULES_FILE branch BEFORE the az invocation, keeping `existsSync` as the first gate (distinct missing-file vs malformed-JSON diagnostics, mirroring the AFD WAF_CUSTOM_RULES_FILE precedent). Add a source-scan wiring guard in appgw-waf-rules.test.mjs (matches the bicep <-> JS parity pattern in the same file). * S-3 (IMPROVE): docs/deploying-to-aks.md referenced a nonexistent `BASE_VNET_CIDR`; the validator uses `VNET_CIDR`. Pure doc rename. Tests: 238 / 238 pass (was 236; +2 new regression guards). No bicep changes; az bicep build not re-run.
Replaces CorpNet / CorpNetPublic / CorpNetSAW references in VPN-feature files with vendor-neutral phrasing that preserves the trusted-bypass intent (operator-defined AFD WAF allow-lists; service-tag, IP-range, or header-based rules gating the public ingress to a known managed-network population). - SKILL.md: matrix row + 'Optional: VPN Gateway P2S' section - docs/deploying-to-aks.md: architecture prose + ASCII diagram labels (allow-listed public user / off-allow-list authenticated user) - vpn-gateway.bicep: module header - appgw-waf-rules.test.mjs: fixture rule AllowCorpNet -> AllowOperator - template.env: VPN section (pre-existing AFD WAF leaks at lines 72/79 deferred to a separate follow-up). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Removes references to PAW workflow internals (Phase 1/2/3/4 markers, Spec FR-xxx IDs, Plan section refs, SHOULD-FIX/BLOCKING review labels, CodeResearch citations) that leaked into code/bicep comments during the VPN P2S ingress implementation. These markers were meaningful only inside the PAW session and read as noise to anyone (human or AI) encountering the files later without that context. Comments rewritten to describe what the code does and why, without referencing planning artifacts that aren't part of the repository. Scope: limited to comments introduced by the VPN PR. Pre-existing Phase 4 / FR-022 references from earlier work (deploy-bicep.mjs, main.bicep header, etc.) are deferred — same treatment as the pre-existing AFD WAF CorpNet leaks at template.env:72,79. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Three focused additions so the agent triages VPN-enabled stamps as
first-class as AFD-only stamps:
1. Path Selection: new signal row routing VPN/P2S/Entra-ID-VPN intent
to the new-env-deploy skill's 'Optional: VPN Gateway P2S' section.
2. Pre-validate paragraph: surfaces the VPN combo gates
([vpn-incompatible-combo], vpn-requires-afd, vpn-requires-akv,
[vpn-pool-overlap]) so the agent calls them out BEFORE the user
hits the named error from new-env.mjs / deploy.mjs.
3. Common Pitfalls block: two new VPN entries analogous to the AFD
propagation-delay note —
a) Gateway provisioning lead time: 45+ min on first deploy is
expected, not a failure; don't interrupt, re-run, or
--force-module the bicep step.
b) Conditional Access policy is a tenant-admin OOB step the
deploy cannot enforce; surface it before declaring a stamp
ready for use.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When TLS_SOURCE=akv (required for VPN-enabled stamps), the scaffolder now prompts for SSL_CERT_DOMAIN_SUFFIX interactively and accepts --ssl-cert-domain-suffix <suffix> non-interactively. Previously this key had to be hand-edited post-scaffold, which the agent had no way to surface up-front. Stub overlays (letsencrypt, akv-selfsigned) are unaffected — applyStubKeys continues to stamp 'unused'. Agent + skill updated to reflect the new prompt/flag and drop the hand-edit guidance. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…dance Closes the gap where an engineer saying 'spin up a VPN-enabled env' would not see VPN_CLIENT_ADDRESS_POOL / VPN_AAD_AUDIENCE / SSL_CERT_DOMAIN_SUFFIX as confirmable rows in the Step 2 defaults table. Adds explicit guidance that 'VPN access' auto-implies edge-mode=afd + tls-source=akv so the user isn't asked to discover the only valid combo, and that 45-min provisioning + the CA policy prerequisite must be surfaced before invoking the scaffolder. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…stamps Each stamp must get its own dedicated Entra app. The agent was inferring 'reuse this app' intent from PORTAL_AUTH_ENTRA_CLIENT_ID values present in sibling stamps' .env files when those stamps were used as references for subscription/region/tenant. Reusing the client id couples stamp lifecycles, pollutes redirect URI lists, and creates the 'shared app blast radius' problem the rest of the guidance already warns against. Pull non-auth values from sibling stamps freely; the client id is off-limits unless the user explicitly and unprompted asks to reuse a specific app. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
VPN-enabled stamps were failing the base-infra bicep step with 'VpnGw1 # @Allowed: ...' is not part of the allowed value(s). The template.env documents allowed values inline after the '=' and parseEnvFile was passing that whole string through to substitute-env.mjs and on into the bicep parameter file. Strip inline ' #' comments from unquoted values (standard dotenv behavior). Quoted values and unquoted values with '#' that isn't preceded by whitespace (URL fragments) are preserved. Full-line '#' comments continue to be skipped. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Azure no longer accepts non-AZ VPN gateway SKUs ('VpnGw1', 'VpnGw2',
'VpnGw3') for new VPN gateways:
NonAzSkusNotAllowedForVPNGateway: VpnGw1-5 non-AZ SKUs are no
longer supported for VPN gateways. Only VpnGw1-5AZ SKUs can be
created going forward.
Switch the bicep default + template.env default to VpnGw1AZ, remove
the non-AZ entries from the bicep @Allowed list, and update all
user-facing references (docs, agent skills, post-scaffold reminder,
reminder block in deploy.mjs).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
AZ VPN gateway SKUs (VpnGw{1,2,3}AZ) require their associated
Public IP to be zone-redundant. Without zones configured, Azure
rejects the deployment:
VmssVpnGatewayPublicIpsMustHaveZonesConfigured: Standard Public
IPs associated with VPN Gateways with AZ VPN skus must have
zones configured.
Add zones: ['1','2','3'] unconditionally — the SKU list is now
AZ-only (non-AZ SKUs were deprecated in the previous commit).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…tal app reg - Add Private DNS Resolver inbound endpoint (10.20.19.4) on dedicated /28 - Push resolver IP to P2S clients via VNet dhcpOptions.dnsServers - Fix Private DNS A record name to match AppGw listener / AKV cert subject (thread PORTAL_RESOURCE_NAME into base-infra) - Setup-PortalAuth: auto-register both AFD + VPN portal URIs on AFD+VPN stamps - Cost: +$170/mo for Private DNS Resolver (total +$450/mo for full VPN+resolver) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Captures the verified Microsoft Learn pattern for moving VPN access control off tenant-admin Conditional Access onto deployer-owned per-stamp custom audience apps with appRoleAssignmentRequired=true. Mirrors the portal auth-assignments model and proposes -MirrorToVpn semantics so granting portal access also grants VPN access by default. Follow-up to PR #53. Not implemented here. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…file skill Wraps `az network vnet-gateway vpn-client generate` (EAPTLS) to download the Azure VPN client profile (azurevpnconfig.xml) into the gitignored per-stamp deploy/envs/local/<env>/vpn-client/ folder. Adds a SKILL.md documenting when/how to use it, references it from the pilotswarm-npm-deployer agent so the deployer can offer it automatically at the end of a first successful VPN-enabled deploy, and refreshes the VPN client distribution sections in docs/deploying-to-aks.md and pilotswarm-new-env-deploy/SKILL.md to recommend the helper (also corrects the stale `vpn-client generate-url` reference to `vpn-client generate --authentication-method EAPTLS`). Tested live on chkrawvpn. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Pre-merge audit found three real drift items between the docs and the shipped behavior: 1. new-env.mjs interactive VPN prompt advertised ~$140/mo, but the cost banner in the same scaffolder run (and template.env, deploy.mjs, and every doc surface) is ~$450/mo total (~$280 VPN + ~$170 DNS Resolver). Fixed both occurrences in INPUTS. 2. new-env.mjs post-scaffold VPN reminder pointed at the stale 'az network vnet-gateway vpn-client generate-url' subcommand. The shipped helper script, docs/deploying-to-aks.md, and the pilotswarm-new-env-deploy / pilotswarm-vpn-client-profile skills all use the correct 'vpn-client generate --authentication-method EAPTLS' form. Updated the reminder to (a) recommend the new Get-VpnClientProfile.ps1 helper as the preferred path, (b) cross-reference the pilotswarm-vpn-client-profile skill, and (c) emit the correct az subcommand as the CLI fallback. 3. docs/deploying-to-aks.md "Optional: VPN Gateway P2S" used an ambiguous <resourceName>.<SSL_CERT_DOMAIN_SUFFIX> placeholder. The Private DNS A record is keyed on PORTAL_RESOURCE_NAME (e.g. pschkrawvpn-wus3-portal) to match the AppGw listener / AKV cert subject; RESOURCE_PREFIX alone is only the backwards-compat fallback. Clarified inline. Also updated the new-env vpn-reminder test to assert the corrected subcommand and the helper-script reference. All 49 deploy/scripts/test/new-env.test.mjs tests pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Web-verified 2026-06-18: no Azure VPN Client exists on the iOS App Store (Windows/macOS/Android only). iOS users therefore cannot use AAD-interactive auth to Azure VPN Gateway at all — not a CA policy issue, just absent product. The only iOS-supported paths are native iOS VPN with IKEv2+cert auth or third-party OpenVPN clients with cert auth. Added Phase 6 to the implementation table and a new `iOS support requires certificate-based authentication` section covering: dual-auth gateway pattern (bicep supports AAD + vpnClientRootCertificates simultaneously), required scripts (root cert generation, per-user client certs, .mobileconfig generator for iOS one-tap install), critical access-control implications (cert auth bypasses Entra entirely — short validity, per-device certs, revocation list mandatory), and backwards compatibility (additive only). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ChrisKrawczyk
pushed a commit
that referenced
this pull request
Jun 22, 2026
Brings in v0.2.0 release: VPN P2S ingress (#53), dual redirect URIs, client-profile helper, gitignore .claude/, proposals doc revisions. Conflict resolution (all 10 conflicts): - package.json x3, packages/{cli,sdk}/package.json: took main's 0.2.0 version bumps (our 0.1.36 bump was branch-local, never released). - package.json (test:deploy-scripts): merged — kept our 3 test additions (build-call-sites, dockerfile-worker, setup-obo-smoke-worker-app) and picked up main's appgw-waf-rules.test.mjs. - package-lock.json: took theirs + npm install reconcile. - CHANGELOG.md: our entry restructured under '## Unreleased'; main's '## 0.2.0 — 2026-06-19' entry inserted below. - deploy/envs/template.env, both base-infra bicep files, two skill markdowns: pure additive blocks (OBO vs VPN sections / params / bullets); both halves retained, ours-first ordering. Validation: - pilotswarm-sdk@0.2.0: tsc build clean. - packages/sdk/test/local/plugin-tools-contract.test.js: 19/19 pass. - npm run test:deploy-scripts: 281/281 pass (was 248 on main; +33 from this branch's smoke/OBO test additions). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
VPN P2S Ingress — Phases 1–4
Adds Azure VPN Gateway P2S with Microsoft Entra ID authentication as an additive, optional ingress to the node-based deployment orchestrator. Coexists with the existing AFD edge mode as a "trusted-bypass" path so off-network employees with valid Entra ID can reach the portal even when AFD WAF service-tag allow-lists would block them.
This single PR covers Phases 1–4, which were implemented and reviewed sequentially on this branch. Phase 4 (docs-only) was added directly to this branch by user direction (one PR rather than two) — see
WorkflowContext.mdNOTE 2026-06-15 (post-Phase-3).Planning artifacts (uncommitted)
Per
WorkflowContext.mdArtifact Lifecycle: never-commit, planning artifacts live only in the working tree:.paw/work/vpn-p2s-ingress/Spec.md.paw/work/vpn-p2s-ingress/ImplementationPlan.md.paw/work/vpn-p2s-ingress/Docs.mdWhat each phase contributed
cd48571) — Bicep: new VPN module underdeploy/services/base-infra/plusfrontDoorIdplumbing inglobal-infraso the AppGw WAF can distinguish AFD vs. VPN traffic.59c80e9+ review fixes14805a9) — Orchestrator:deploy.mjsvalidator + env threading, AppGw WAF custom-rules file wiring, and thetenantIdresolution fix (Option A: thread through env, no implicitaz accountdependency).a402914+ review fixes2831b7d) — Scaffolder:new-env.mjsVPN UX prompts and tests; also fixed a latentfoundryEnabledtruthy-string bug discovered along the way.18f91bd) — Docs:.github/skills/pilotswarm-new-env-deploy/SKILL.md(VPN row in topology table;EDGE_MODE × TLS_SOURCE × VPN_GATEWAY_ENABLEDmatrix with rejected combos and named diagnostic codes; "Optional: VPN Gateway P2S" subsection covering CA policy, client-profile distribution, audience override, and auto-seeded WAF guards),.github/skills/pilotswarm-aks-deploy/SKILL.md(45+ min / ~$140/mo cost note + cross-references), anddocs/deploying-to-aks.md(AFD+VPN row in topology matrix; new "Optional: VPN Gateway P2S" section covering architecture, preconditions, env vars, CA policy, client-profile distribution, the 90/91/92 auto-seeded WAF guards, and the general-purposeAPPGW_WAF_CUSTOM_RULES_FILEoperator hook). The standalonepilotswarm-vpn-setupskill was dropped in spec — guidance folds into the existing surfaces.Deviations from plan that survived review
tenantIdresolution — Option A (env threading). Resolved through the env layer rather than via on-the-flyazcalls; keeps non-interactive paths deterministic and avoids hidden CLI dependencies in the orchestrator hot path.main(). The validator's overlap check is also invoked frommain()for non-interactive paths, so CI / scripted runs get the same guard as the interactive flow.foundryEnabledtruthy-string bug fix (Phase 3). Out-of-scope for the VPN spec but caught while editing the scaffolder; fix included since it was a one-line correctness issue in adjacent code._phase4sub-branch). Convention drift from theprsReview Strategy default for this work item, recorded inWorkflowContext.md.AAD audience GUID
c632b3df-fb67-4d84-bdcf-b95ad541b5c8— the current Microsoft-registered "Azure VPN Client" audience.41b23e61-6c1e-4545-b367-cd054e0ed4b4— documented as a supported override for tenants still pinned to the legacy app registration.Tests
236 / 236 passing. Phase 4 is docs-only and does not change the test surface.
Out of scope for this PR
ImplementationPlan.md§Phase 5), tracked separately.Final-review follow-ups applied (commit d108d8e)
paw-final-review returned PASS-WITH-NOTES; applied three notes in a single follow-up commit on this branch:
overlay-contracts.mjsrepointed from the nonexistentdeploy/docs/vpn-p2s.mdto the canonicaldocs/deploying-to-aks.md. Regression guard added inoverlay-contracts.test.mjs.resolveAppgwWafCustomRulesFile()now actually runs on the deploy path.deploy-bicep.mjsparses + structurally validatesAPPGW_WAF_CUSTOM_RULES_FILE(fail-closed, named error) before invokingaz, mirroring the AFDWAF_CUSTOM_RULES_FILEprecedent. Wiring guard added inappgw-waf-rules.test.mjs.BASE_VNET_CIDR->VNET_CIDRindocs/deploying-to-aks.md(validator never used theBASE_VNET_form).Tests: 238 / 238 pass (was 236; +2 new regression guards). No bicep changes.
Post-final-review additions (commit 6714e65) — seamless P2S DNS + dual-URI portal app reg
Live-environment shakeout after the final-review pass surfaced two gaps that broke the end-user portal flow over VPN. Both are fixed in the same branch (PR #53) to keep the feature shippable as one unit:
Seamless P2S DNS — VPN clients were connecting fine but
nslookupof the portal hostname failed because the gateway wasn't pushing a DNS server through the tunnel. Hosts-file edits / client-side scripts were ruled out by user direction (must be zero-touch for end users).deploy/services/base-infra/bicep/dns-resolver.bicep— Azure Private DNS Resolver inbound endpoint with static IP10.20.19.4.dns-resolver-inbound-subnet(10.20.19.0/28) on the base-infra VNet, delegated toMicrosoft.Network/dnsResolvers(conditional onVPN_GATEWAY_ENABLED=true).properties.dhcpOptions.dnsServersnow carries the resolver IP — the gateway forwards this to P2S clients at connect time, so they inherit the resolver automatically with no client-side action. (The classicMicrosoft.Network/virtualNetworkGatewaysARM schema has no DNS-push field; this is the only supported path.)deploy/envs/template.env,deploy/scripts/deploy.mjscost banner,deploy/scripts/new-env.mjscost block, the two deploy skills, and the relevantnew-env.test.mjsassertions.Private DNS A record matches AppGw listener / AKV cert subject — the initial implementation wrote the A record as
<prefix>.<suffix>(e.g.pschkrawvpn.dev.pilotswarm.azure.com), but the AppGw HTTPS listener / AKV cert subject is<prefix>-<regionShort>-portal.<suffix>(e.g.pschkrawvpn-wus3-portal.dev.pilotswarm.azure.com). VPN users hit aNET::ERR_CERT_COMMON_NAME_INVALIDwarning. ThreadedPORTAL_RESOURCE_NAME(already innew-env-generated.env) throughbase-infra.params.template.jsonandmain.bicepintoprivate-dns-portal.bicepso the A record label now matches the cert subject.Portal app registration auto-registers both URIs on AFD+VPN stamps — after the cert-mismatch fix, sign-in failed with
AADSTS50011because the portal app reg only had the AFD endpoint as a SPA redirect URI.deploy/scripts/auth/Setup-PortalAuth.ps1now:-RedirectUrias[string[]](single string still works).Resolve-RedirectUriFromEnvreturns BOTH the AFD endpoint ANDPORTAL_HOSTNAMEwhen the bicep cache showsEDGE_MODE=afd+ a non-emptyVPN_GATEWAY_ID..github/skills/pilotswarm-portal-app-reg/SKILL.mdupdated to document the AFD+VPN dual-URI behavior.Live verified on stamp
chkrawvpn: VPN connects, portal DNS resolves over the tunnel, TLS handshake clean, Entra sign-in completes.Post-PR additions (commits
63be9fc,bc521b8) — VPN access management proposal + client profile downloaderTwo follow-ups landed after the seamless-DNS work, both gated on user direction (one PR, not two):
docs/proposals/vpn-access-management.md(commit63be9fc). Forward-looking proposal to fold VPN access management into the same deployer-owned model used for portal app roles: per-stamp custom audience app (Setup-VpnAuth.ps1),Set-VpnAccess.ps1for assign/revoke, and an optional-MirrorToVpnflag on the existing portal-assignments script so the portal allowlist and the VPN allowlist can move together. References three Microsoft Learn pages confirming the custom-audience workflow, theappRoleAssignmentRequired=truegate, thec632b3df-…Azure VPN Client preauthorized scope, and the no-nested-groups constraint. 5-phase implementation preview included. Proposal only — no code changes.deploy/scripts/auth/Get-VpnClientProfile.ps1+pilotswarm-vpn-client-profileskill (commitbc521b8). Operator-grade helper that wrapsaz network vnet-gateway vpn-client generate --authentication-method EAPTLS, downloads the gateway-issued zip via the returned SAS URL, and extracts theazurevpnconfig.xmlunder the gitignoreddeploy/envs/local/<stamp>/vpn-client/folder. Reads the gateway resource id from the existingdeploy/.tmp/<env>/bicep-outputs.cache.json. New skill documents when to use, sensitivity, identifier formats, four worked examples, and end-user import instructions;pilotswarm-npm-deployeragent updated to reference the skill so it can be offered automatically at the end of a first successful VPN-enabled deploy. The "Distributing the VPN client profile" sections indocs/deploying-to-aks.mdandpilotswarm-new-env-deploy/SKILL.mdnow point at the helper (also corrects a stalevpn-client generate-urlreference tovpn-client generate --authentication-method EAPTLS). Live verified onchkrawvpn.