Skip to content

VPN P2S Ingress: Phases 1-4 (bicep + orchestrator + scaffolder + docs)#53

Merged
ChrisKrawczyk merged 22 commits into
mainfrom
feature/vpn-p2s-ingress
Jun 19, 2026
Merged

VPN P2S Ingress: Phases 1-4 (bicep + orchestrator + scaffolder + docs)#53
ChrisKrawczyk merged 22 commits into
mainfrom
feature/vpn-p2s-ingress

Conversation

@ChrisKrawczyk

@ChrisKrawczyk ChrisKrawczyk commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

VPN P2S Ingress — Phases 1–4

Adds Azure VPN Gateway P2S with Microsoft Entra ID authentication as an additive, optional ingress to the node-based deployment orchestrator. Coexists with the existing AFD edge mode as a "trusted-bypass" path so off-network employees with valid Entra ID can reach the portal even when AFD WAF service-tag allow-lists would block them.

This single PR covers Phases 1–4, which were implemented and reviewed sequentially on this branch. Phase 4 (docs-only) was added directly to this branch by user direction (one PR rather than two) — see WorkflowContext.md NOTE 2026-06-15 (post-Phase-3).

Planning artifacts (uncommitted)

Per WorkflowContext.md Artifact Lifecycle: never-commit, planning artifacts live only in the working tree:

  • Spec: .paw/work/vpn-p2s-ingress/Spec.md
  • Plan: .paw/work/vpn-p2s-ingress/ImplementationPlan.md
  • As-built doc: .paw/work/vpn-p2s-ingress/Docs.md

What each phase contributed

  • Phase 1 (cd48571) — Bicep: new VPN module under deploy/services/base-infra/ plus frontDoorId plumbing in global-infra so the AppGw WAF can distinguish AFD vs. VPN traffic.
  • Phase 2 (59c80e9 + review fixes 14805a9) — Orchestrator: deploy.mjs validator + env threading, AppGw WAF custom-rules file wiring, and the tenantId resolution fix (Option A: thread through env, no implicit az account dependency).
  • Phase 3 (a402914 + review fixes 2831b7d) — Scaffolder: new-env.mjs VPN UX prompts and tests; also fixed a latent foundryEnabled truthy-string bug discovered along the way.
  • Phase 4 (18f91bd) — Docs: .github/skills/pilotswarm-new-env-deploy/SKILL.md (VPN row in topology table; EDGE_MODE × TLS_SOURCE × VPN_GATEWAY_ENABLED matrix with rejected combos and named diagnostic codes; "Optional: VPN Gateway P2S" subsection covering CA policy, client-profile distribution, audience override, and auto-seeded WAF guards), .github/skills/pilotswarm-aks-deploy/SKILL.md (45+ min / ~$140/mo cost note + cross-references), and docs/deploying-to-aks.md (AFD+VPN row in topology matrix; new "Optional: VPN Gateway P2S" section covering architecture, preconditions, env vars, CA policy, client-profile distribution, the 90/91/92 auto-seeded WAF guards, and the general-purpose APPGW_WAF_CUSTOM_RULES_FILE operator hook). The standalone pilotswarm-vpn-setup skill was dropped in spec — guidance folds into the existing surfaces.

Deviations from plan that survived review

  • tenantId resolution — Option A (env threading). Resolved through the env layer rather than via on-the-fly az calls; keeps non-interactive paths deterministic and avoids hidden CLI dependencies in the orchestrator hot path.
  • Pool-overlap re-check in main(). The validator's overlap check is also invoked from main() for non-interactive paths, so CI / scripted runs get the same guard as the interactive flow.
  • Latent foundryEnabled truthy-string bug fix (Phase 3). Out-of-scope for the VPN spec but caught while editing the scaffolder; fix included since it was a one-line correctness issue in adjacent code.
  • Phase 4 docs landed on this branch (not a _phase4 sub-branch). Convention drift from the prs Review Strategy default for this work item, recorded in WorkflowContext.md.

AAD audience GUID

  • Default: c632b3df-fb67-4d84-bdcf-b95ad541b5c8 — the current Microsoft-registered "Azure VPN Client" audience.
  • Legacy override: 41b23e61-6c1e-4545-b367-cd054e0ed4b4 — documented as a supported override for tenants still pinned to the legacy app registration.

Tests

236 / 236 passing. Phase 4 is docs-only and does not change the test surface.

Out of scope for this PR

  • Phase 5 — operator-driven manual acceptance evidence (six-item checklist in ImplementationPlan.md §Phase 5), tracked separately.

Final-review follow-ups applied (commit d108d8e)

paw-final-review returned PASS-WITH-NOTES; applied three notes in a single follow-up commit on this branch:

  • C-1 (CONSENSUS): VPN combo-error hints in overlay-contracts.mjs repointed from the nonexistent deploy/docs/vpn-p2s.md to the canonical docs/deploying-to-aks.md. Regression guard added in overlay-contracts.test.mjs.
  • S-2: resolveAppgwWafCustomRulesFile() now actually runs on the deploy path. deploy-bicep.mjs parses + structurally validates APPGW_WAF_CUSTOM_RULES_FILE (fail-closed, named error) before invoking az, mirroring the AFD WAF_CUSTOM_RULES_FILE precedent. Wiring guard added in appgw-waf-rules.test.mjs.
  • S-3 (IMPROVE): doc rename BASE_VNET_CIDR -> VNET_CIDR in docs/deploying-to-aks.md (validator never used the BASE_VNET_ form).

Tests: 238 / 238 pass (was 236; +2 new regression guards). No bicep changes.


Post-final-review additions (commit 6714e65) — seamless P2S DNS + dual-URI portal app reg

Live-environment shakeout after the final-review pass surfaced two gaps that broke the end-user portal flow over VPN. Both are fixed in the same branch (PR #53) to keep the feature shippable as one unit:

  • Seamless P2S DNS — VPN clients were connecting fine but nslookup of the portal hostname failed because the gateway wasn't pushing a DNS server through the tunnel. Hosts-file edits / client-side scripts were ruled out by user direction (must be zero-touch for end users).

    • New module deploy/services/base-infra/bicep/dns-resolver.bicep — Azure Private DNS Resolver inbound endpoint with static IP 10.20.19.4.
    • New dns-resolver-inbound-subnet (10.20.19.0/28) on the base-infra VNet, delegated to Microsoft.Network/dnsResolvers (conditional on VPN_GATEWAY_ENABLED=true).
    • VNet properties.dhcpOptions.dnsServers now carries the resolver IP — the gateway forwards this to P2S clients at connect time, so they inherit the resolver automatically with no client-side action. (The classic Microsoft.Network/virtualNetworkGateways ARM schema has no DNS-push field; this is the only supported path.)
    • Cost delta: +$170/mo for the resolver (total VPN+resolver footprint +$450/mo). Reflected in deploy/envs/template.env, deploy/scripts/deploy.mjs cost banner, deploy/scripts/new-env.mjs cost block, the two deploy skills, and the relevant new-env.test.mjs assertions.
  • Private DNS A record matches AppGw listener / AKV cert subject — the initial implementation wrote the A record as <prefix>.<suffix> (e.g. pschkrawvpn.dev.pilotswarm.azure.com), but the AppGw HTTPS listener / AKV cert subject is <prefix>-<regionShort>-portal.<suffix> (e.g. pschkrawvpn-wus3-portal.dev.pilotswarm.azure.com). VPN users hit a NET::ERR_CERT_COMMON_NAME_INVALID warning. Threaded PORTAL_RESOURCE_NAME (already in new-env-generated .env) through base-infra.params.template.json and main.bicep into private-dns-portal.bicep so the A record label now matches the cert subject.

  • Portal app registration auto-registers both URIs on AFD+VPN stamps — after the cert-mismatch fix, sign-in failed with AADSTS50011 because the portal app reg only had the AFD endpoint as a SPA redirect URI. deploy/scripts/auth/Setup-PortalAuth.ps1 now:

    • Accepts -RedirectUri as [string[]] (single string still works).
    • Resolve-RedirectUriFromEnv returns BOTH the AFD endpoint AND PORTAL_HOSTNAME when the bicep cache shows EDGE_MODE=afd + a non-empty VPN_GATEWAY_ID.
    • The append-to-existing-app path handles multiple URIs idempotently.
    • .github/skills/pilotswarm-portal-app-reg/SKILL.md updated to document the AFD+VPN dual-URI behavior.

Live verified on stamp chkrawvpn: VPN connects, portal DNS resolves over the tunnel, TLS handshake clean, Entra sign-in completes.


Post-PR additions (commits 63be9fc, bc521b8) — VPN access management proposal + client profile downloader

Two follow-ups landed after the seamless-DNS work, both gated on user direction (one PR, not two):

  • docs/proposals/vpn-access-management.md (commit 63be9fc). Forward-looking proposal to fold VPN access management into the same deployer-owned model used for portal app roles: per-stamp custom audience app (Setup-VpnAuth.ps1), Set-VpnAccess.ps1 for assign/revoke, and an optional -MirrorToVpn flag on the existing portal-assignments script so the portal allowlist and the VPN allowlist can move together. References three Microsoft Learn pages confirming the custom-audience workflow, the appRoleAssignmentRequired=true gate, the c632b3df-… Azure VPN Client preauthorized scope, and the no-nested-groups constraint. 5-phase implementation preview included. Proposal only — no code changes.

  • deploy/scripts/auth/Get-VpnClientProfile.ps1 + pilotswarm-vpn-client-profile skill (commit bc521b8). Operator-grade helper that wraps az network vnet-gateway vpn-client generate --authentication-method EAPTLS, downloads the gateway-issued zip via the returned SAS URL, and extracts the azurevpnconfig.xml under the gitignored deploy/envs/local/<stamp>/vpn-client/ folder. Reads the gateway resource id from the existing deploy/.tmp/<env>/bicep-outputs.cache.json. New skill documents when to use, sensitivity, identifier formats, four worked examples, and end-user import instructions; pilotswarm-npm-deployer agent updated to reference the skill so it can be offered automatically at the end of a first successful VPN-enabled deploy. The "Distributing the VPN client profile" sections in docs/deploying-to-aks.md and pilotswarm-new-env-deploy/SKILL.md now point at the helper (also corrects a stale vpn-client generate-url reference to vpn-client generate --authentication-method EAPTLS). Live verified on chkrawvpn.

chkraw and others added 6 commits June 15, 2026 15:48
Adds the additive, optional Azure VPN Gateway P2S ingress at the bicep layer (orchestrator/env/scaffolder/docs land in later phases).

- global-infra: expose frontDoorId output from frontdoor-profile.bicep and main.bicep so per-stamp AppGw WAF rules can allow-list AFD via X-Azure-FDID (auto-aliased to FRONT_DOOR_ID through deploy-bicep.mjs).

- base-infra vnet.bicep: optional GatewaySubnet (/27, conditional on vpnGatewayEnabled). Output gatewaySubnetId is empty string when disabled to keep shape stable.

- base-infra vpn-gateway.bicep (NEW): VPN Gateway (RouteBased, OpenVPN, AAD-auth) + Standard PIP + diagnostic settings. SKU @Allowed list excludes Basic. Uses subscription().tenantId (no stamp-level tenant param exists today). AAD endpoint URLs suppressed for no-hardcoded-env-urls (sovereign clouds require audience + endpoints migrated as a tuple).

- base-infra private-dns-portal.bicep (NEW): managed-only private DNS zone + VNet link + A record for portal hostname over the tunnel. Conditional; empty outputs when disabled. Cross-references portal/bicep/main.bicep:239-261 (not reused — different lifecycle).

- base-infra application-gateway.bicep: new params appgwWafCustomRules / vpnGatewayEnabled / vpnClientAddressPool / frontDoorId. Builds auto-seeded WAF custom rules AllowAfd(90) / AllowVpn(91) / BlockOther(92) when vpnGatewayEnabled; concats operator rules (>= prio 100). customRules attached directly to AppGw WAF policy (schema is array, not { rules: [] }).

- base-infra main.bicep: new params with safe defaults (vpnGatewayEnabled=false, vpnGatewaySku=VpnGw1, vpnClientAddressPool=172.16.200.0/24, vpnAadAudience=public-cloud Azure VPN app GUID, appgwWafCustomRules=[], frontDoorId=''). Threads them into VNet/AppGateway, conditionally instantiates VpnGateway (after LogAnalytics) and PortalPrivateDns (after AppGw, edgeMode=='afd' gated). New outputs vpnGatewayId/vpnGatewayPublicIp/vpnClientAddressPool/vpnPrivateDnsZoneId.

- base-infra params template: add VPN_GATEWAY_ENABLED, VPN_GATEWAY_SKU, VPN_CLIENT_ADDRESS_POOL, VPN_AAD_AUDIENCE, FRONT_DOOR_ID placeholders (appgwWafCustomRules NOT placed here — Phase 2 wires it via --parameters @file like AFD).

Deviation from plan: plan called for tenantId to be 'an existing param already threaded by the orchestrator' on base-infra main.bicep. No such param exists today (submodules each default to subscription().tenantId). Followed that pattern: VPN module receives subscription().tenantId from main.bicep. Phase 2 can promote to an explicit AZURE_TENANT_ID-bound param if needed.

Verification:

- az bicep build (base-infra/main, global-infra/main, vpn-gateway, private-dns-portal): exit 0, no warnings on touched files (pre-existing postgres.bicep secure-input warning unchanged).

- npm run test:deploy-scripts: 187 pass / 0 fail (regression baseline preserved).
… tenantId fix

- overlay-contracts.mjs: validateVpnGatewayCombo() (afd/akv/suffix/pool-overlap)
  wired into validateRequiredEnv(); IPv4 CIDR overlap helper (pure JS).
- template.env: VPN Gateway P2S section (AKV-only constraint, ~\/mo,
  45+ min, CA policy reminder on app c632b3df-...) + APPGW_WAF_CUSTOM_RULES_FILE
  block mirroring the AFD WAF_CUSTOM_RULES_FILE shape.
- base-infra/bicep/main.bicep: new 'tenantId' param threaded from
  AZURE_TENANT_ID via the params template; VpnGateway module now uses
  this explicit param instead of subscription().tenantId (Spec FR-001/FR-008).
  keyvault/postgres modules keep their existing defaults — fix scoped to
  the VPN gateway path only (carry-forward Option A from Phase 1 impl-review).
- deploy-bicep.mjs: APPGW_WAF_CUSTOM_RULES_FILE pass-through for base-infra
  mirroring the AFD WAF_CUSTOM_RULES_FILE machinery (gitignored location,
  same error shape on missing file).
- deploy.mjs printHelp: VPN time/cost note (45+ min, ~\/mo, AFD coexistence).
- lib/appgw-waf-rules.mjs (NEW): buildAutoSeedRules + buildMergedCustomRules
  + resolveAppgwWafCustomRulesFile — JS shadow of the bicep autoSeedRules /
  concat var. Kept in lockstep with application-gateway.bicep by code review;
  guarded by a bicep-shape parity test.
- Tests:
  * overlay-contracts.test.mjs: 5 rejection + valid + disabled + malformed
    + VNET_CIDR override + aggregation cases for validateVpnGatewayCombo.
  * appgw-waf-rules.test.mjs (NEW): four merged-shape cases
    (VPN on/off x ops empty/non-empty) + AllowAfd/AllowVpn/BlockOther
    structural assertions + file-resolution (relative/absolute/missing/
    malformed/non-array) + frontDoorId aliasFor() → FRONT_DOOR_ID +
    bicep parity guard.
- package.json: register appgw-waf-rules.test.mjs in test:deploy-scripts.
…IMPROVE-1)

Addresses Phase 2 impl-review findings:

SHOULD-FIX-1: validateRequiredEnv now returns { missing, combo } — VPN

combo errors travel in a separate channel from missing[] so deploy.mjs

and new-env.mjs render them with named-error messages and remediation

hints that point at the env file / docs, not at the scaffolder (which

would clobber operator edits and is the wrong fix).

SHOULD-FIX-2: bicep<->JS parity guard now asserts literal tokens

(X-Azure-FDID, frontDoorId, IPMatch, RemoteAddr, vpnClientAddressPool,

0.0.0.0/0, action values) per-rule, sliced by rule-name boundaries.

Prevents a divergent edit on either side from sneaking past the test.

IMPROVE-1: new vpn-requires-tenant-id combo error closes the gap

behind bicep's 'param tenantId string = ''' default — a blanked

AZURE_TENANT_ID with VPN enabled now fails closed pre-deploy.

Tests: 218 pass (was 215), 3 added.
Adds two new INPUTS to deploy/scripts/new-env.mjs (--vpn-enabled, --vpn-client-address-pool) using the existing declarative pattern. The yes/no VPN prompt is asked after edge/tls; on yes the scaffolder hard-refuses incompatible combos with a named [vpn-incompatible-combo] error before any partial .env is written, and validates VPN_CLIENT_ADDRESS_POOL against the stamp VNet CIDR by reusing Phase 2's validateVpnGatewayCombo (no duplicated overlap arithmetic).

VPN_GATEWAY_SKU and VPN_AAD_AUDIENCE flow through unchanged from template.env defaults (VpnGw1, c632b3df-...). Dropped-from-spec keys (VPN_AAD_TENANT_ID, VPN_PRIVATE_DNS_MODE, PRIVATE_DNS_ZONE_ID, VPN_AAD_USERS_GROUP_NAME_HINT) are never emitted.

On VPN=yes a clearly-delimited post-scaffold reminder block surfaces the CA-policy requirement (target app c632b3df-..., named users group, MFA, do-not-require-compliance), legacy 41b23e61-... audience override, ~/mo cost, 45+ min first-deploy time, and a pointer to the docs/deploying-to-aks.md VPN section (Phase 4 will write the section body).

Tests: 9 new cases in deploy/scripts/test/new-env.test.mjs covering happy path, both refusal combos, custom + overlapping pool, reminder content, and the VPN=no regression. npm run test:deploy-scripts: 227 passing.
Addresses Phase 3 impl-review findings (review FAIL):

BLOCKING: --vpn-enabled yes/true silently scaffolded VPN_GATEWAY_ENABLED=false because deriveTargets() only honoured literal 'y'/boolean true while main() normalised via normaliseYesNo(). Route both vpnEnabled and foundryEnabled (audit) through normaliseYesNo() so any truthy value (y, yes, true, boolean true) consistently enables the feature.

SHOULD-FIX 1: VPN reminder block now includes Azure-portal client-profile download path (Resource group -> gateway -> Point-to-site configuration -> Download VPN client) plus an 'az network vnet-gateway vpn-client generate-url' CLI alternative, matching the plan requirement.

SHOULD-FIX 2: vpnEnabled / vpnClientAddressPool INPUTS entries moved to immediately after tlsSource. Combo refusal (assertVpnGatewayCombo) and pool-overlap re-check now fire inline in gatherInputs() right after each value resolves, so the operator hits the gate before any secret prompts. main()'s late check is preserved as defense-in-depth for non-interactive paths.

SHOULD-FIX 3: Removed 'akv-selfsigned' from VPN-compatible help text, menu choice description, and [vpn-incompatible-combo] error message. Spec mandates TLS_SOURCE=akv only; the validator already rejected akv-selfsigned correctly.

Tests: added regression coverage for all four findings (CLI yes/true, programmatic boolean, INPUTS prompt-order placement, download-path reminder text, akv-only error wording, foundryEnabled audit). 236/236 deploy-script tests pass.
Folds VPN P2S ingress documentation into the existing operator surfaces - the standalone pilotswarm-vpn-setup skill was dropped in spec.

- .github/skills/pilotswarm-new-env-deploy/SKILL.md: add VPN row to topology table; expand EDGE_MODE x TLS_SOURCE matrix to include VPN_GATEWAY_ENABLED with rejected combos and named diagnostic codes; add 'Optional: VPN Gateway P2S' subsection covering constraints, cost/time (~$140/mo, 45+ min), AAD audience default + legacy override, auto-seeded WAF guards (90/91/92), 'Setting up Conditional Access' (target app, named users group, MFA, no device compliance), and 'Distributing the VPN client profile' (portal + az CLI).

- .github/skills/pilotswarm-aks-deploy/SKILL.md: add Core Learning noting VPN is a GitOps-IaC-path feature, with the 45+ min first-deploy and ~$140/mo cost disclosure plus cross-references to pilotswarm-new-env-deploy and docs/deploying-to-aks.md.

- docs/deploying-to-aks.md: add AFD+VPN row to the topology matrix; new 'Optional: VPN Gateway P2S' section covering architecture, preconditions, env vars, CA policy, client-profile distribution, the auto-seeded 90/91/92 WAF guards, and the general-purpose APPGW_WAF_CUSTOM_RULES_FILE operator hook (parallel to the AFD-side WAF_CUSTOM_RULES_FILE).

Docs-only - no bicep, orchestrator, or test changes. npm run test:deploy-scripts still 236/236.
@ChrisKrawczyk ChrisKrawczyk changed the title VPN P2S Ingress: Phases 1-3 (bicep + orchestrator + scaffolder) VPN P2S Ingress: Phases 1-4 (bicep + orchestrator + scaffolder + docs) Jun 16, 2026
chkraw and others added 16 commits June 16, 2026 08:57
Addresses paw-impl-review findings on the Phase 4 docs commit:

- SHOULD-FIX: docs/deploying-to-aks.md topology intro now describes the EDGE_MODE x TLS_SOURCE x VPN_GATEWAY_ENABLED axes with five supported rows (was 'four combinations / two blocked' for the pre-VPN matrix).

- SHOULD-FIX: AKV-only constraint wording in both docs/deploying-to-aks.md and .github/skills/pilotswarm-new-env-deploy/SKILL.md now distinguishes the validator (vpn-requires-akv accepts akv + akv-selfsigned) from deploy.mjs UNSUPPORTED_COMBOS (additionally rejects afd+akv-selfsigned). The only effective AFD+VPN combo remains TLS_SOURCE=akv; the source of each diagnostic is now explicit.

No code or test changes. npm run test:deploy-scripts still 236/236.
Final-review (PR #53, PASS-WITH-NOTES) follow-ups:

* C-1 (CONSENSUS): VPN combo-error hints in overlay-contracts.mjs pointed
  to `deploy/docs/vpn-p2s.md` which does not exist. Repoint all six hints
  (five combo codes + generic fallback) at the canonical
  `docs/deploying-to-aks.md` -- the same doc new-env.mjs already
  references on the VPN summary. Add a regression-guard test in
  overlay-contracts.test.mjs that scans the source for `deploy/docs/`.

* S-2: `resolveAppgwWafCustomRulesFile()` (parses + validates JSON array
  shape, fail-closed with named error) was dead from the deploy path --
  deploy-bicep.mjs only did `existsSync` before handing the file to az.
  Wire the helper into the APPGW_WAF_CUSTOM_RULES_FILE branch BEFORE the
  az invocation, keeping `existsSync` as the first gate (distinct
  missing-file vs malformed-JSON diagnostics, mirroring the AFD
  WAF_CUSTOM_RULES_FILE precedent). Add a source-scan wiring guard in
  appgw-waf-rules.test.mjs (matches the bicep <-> JS parity pattern in
  the same file).

* S-3 (IMPROVE): docs/deploying-to-aks.md referenced a nonexistent
  `BASE_VNET_CIDR`; the validator uses `VNET_CIDR`. Pure doc rename.

Tests: 238 / 238 pass (was 236; +2 new regression guards).
No bicep changes; az bicep build not re-run.
Replaces CorpNet / CorpNetPublic / CorpNetSAW references in VPN-feature

files with vendor-neutral phrasing that preserves the trusted-bypass

intent (operator-defined AFD WAF allow-lists; service-tag, IP-range, or

header-based rules gating the public ingress to a known managed-network

population).

- SKILL.md: matrix row + 'Optional: VPN Gateway P2S' section

- docs/deploying-to-aks.md: architecture prose + ASCII diagram labels

  (allow-listed public user / off-allow-list authenticated user)

- vpn-gateway.bicep: module header

- appgw-waf-rules.test.mjs: fixture rule AllowCorpNet -> AllowOperator

- template.env: VPN section (pre-existing AFD WAF leaks at lines 72/79

  deferred to a separate follow-up).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Removes references to PAW workflow internals (Phase 1/2/3/4 markers,

Spec FR-xxx IDs, Plan section refs, SHOULD-FIX/BLOCKING review labels,

CodeResearch citations) that leaked into code/bicep comments during

the VPN P2S ingress implementation. These markers were meaningful only

inside the PAW session and read as noise to anyone (human or AI)

encountering the files later without that context.

Comments rewritten to describe what the code does and why, without

referencing planning artifacts that aren't part of the repository.

Scope: limited to comments introduced by the VPN PR. Pre-existing

Phase 4 / FR-022 references from earlier work (deploy-bicep.mjs,

main.bicep header, etc.) are deferred — same treatment as the

pre-existing AFD WAF CorpNet leaks at template.env:72,79.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Three focused additions so the agent triages VPN-enabled stamps as

first-class as AFD-only stamps:

1. Path Selection: new signal row routing VPN/P2S/Entra-ID-VPN intent

   to the new-env-deploy skill's 'Optional: VPN Gateway P2S' section.

2. Pre-validate paragraph: surfaces the VPN combo gates

   ([vpn-incompatible-combo], vpn-requires-afd, vpn-requires-akv,

   [vpn-pool-overlap]) so the agent calls them out BEFORE the user

   hits the named error from new-env.mjs / deploy.mjs.

3. Common Pitfalls block: two new VPN entries analogous to the AFD

   propagation-delay note —

     a) Gateway provisioning lead time: 45+ min on first deploy is

        expected, not a failure; don't interrupt, re-run, or

        --force-module the bicep step.

     b) Conditional Access policy is a tenant-admin OOB step the

        deploy cannot enforce; surface it before declaring a stamp

        ready for use.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When TLS_SOURCE=akv (required for VPN-enabled stamps), the scaffolder

now prompts for SSL_CERT_DOMAIN_SUFFIX interactively and accepts

--ssl-cert-domain-suffix <suffix> non-interactively. Previously this

key had to be hand-edited post-scaffold, which the agent had no way to

surface up-front. Stub overlays (letsencrypt, akv-selfsigned) are

unaffected — applyStubKeys continues to stamp 'unused'.

Agent + skill updated to reflect the new prompt/flag and drop the

hand-edit guidance.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…dance

Closes the gap where an engineer saying 'spin up a VPN-enabled env'

would not see VPN_CLIENT_ADDRESS_POOL / VPN_AAD_AUDIENCE /

SSL_CERT_DOMAIN_SUFFIX as confirmable rows in the Step 2 defaults

table. Adds explicit guidance that 'VPN access' auto-implies

edge-mode=afd + tls-source=akv so the user isn't asked to discover the

only valid combo, and that 45-min provisioning + the CA policy

prerequisite must be surfaced before invoking the scaffolder.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…stamps

Each stamp must get its own dedicated Entra app. The agent was

inferring 'reuse this app' intent from PORTAL_AUTH_ENTRA_CLIENT_ID

values present in sibling stamps' .env files when those stamps were

used as references for subscription/region/tenant. Reusing the client

id couples stamp lifecycles, pollutes redirect URI lists, and creates

the 'shared app blast radius' problem the rest of the guidance

already warns against.

Pull non-auth values from sibling stamps freely; the client id is

off-limits unless the user explicitly and unprompted asks to reuse a

specific app.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
VPN-enabled stamps were failing the base-infra bicep step with

'VpnGw1                                    # @Allowed: ...' is not

part of the allowed value(s). The template.env documents allowed

values inline after the '=' and parseEnvFile was passing that whole

string through to substitute-env.mjs and on into the bicep parameter

file.

Strip inline ' #' comments from unquoted values (standard dotenv

behavior). Quoted values and unquoted values with '#' that isn't

preceded by whitespace (URL fragments) are preserved. Full-line '#'

comments continue to be skipped.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Azure no longer accepts non-AZ VPN gateway SKUs ('VpnGw1', 'VpnGw2',

'VpnGw3') for new VPN gateways:

  NonAzSkusNotAllowedForVPNGateway: VpnGw1-5 non-AZ SKUs are no

  longer supported for VPN gateways. Only VpnGw1-5AZ SKUs can be

  created going forward.

Switch the bicep default + template.env default to VpnGw1AZ, remove

the non-AZ entries from the bicep @Allowed list, and update all

user-facing references (docs, agent skills, post-scaffold reminder,

reminder block in deploy.mjs).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
AZ VPN gateway SKUs (VpnGw{1,2,3}AZ) require their associated

Public IP to be zone-redundant. Without zones configured, Azure

rejects the deployment:

  VmssVpnGatewayPublicIpsMustHaveZonesConfigured: Standard Public

  IPs associated with VPN Gateways with AZ VPN skus must have

  zones configured.

Add zones: ['1','2','3'] unconditionally — the SKU list is now

AZ-only (non-AZ SKUs were deprecated in the previous commit).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…tal app reg

- Add Private DNS Resolver inbound endpoint (10.20.19.4) on dedicated /28
- Push resolver IP to P2S clients via VNet dhcpOptions.dnsServers
- Fix Private DNS A record name to match AppGw listener / AKV cert subject
  (thread PORTAL_RESOURCE_NAME into base-infra)
- Setup-PortalAuth: auto-register both AFD + VPN portal URIs on AFD+VPN stamps
- Cost: +$170/mo for Private DNS Resolver (total +$450/mo for full VPN+resolver)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Captures the verified Microsoft Learn pattern for moving VPN access
control off tenant-admin Conditional Access onto deployer-owned per-stamp
custom audience apps with appRoleAssignmentRequired=true. Mirrors the
portal auth-assignments model and proposes -MirrorToVpn semantics so
granting portal access also grants VPN access by default.

Follow-up to PR #53. Not implemented here.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…file skill

Wraps `az network vnet-gateway vpn-client generate` (EAPTLS) to download the Azure VPN client profile (azurevpnconfig.xml) into the gitignored per-stamp deploy/envs/local/<env>/vpn-client/ folder. Adds a SKILL.md documenting when/how to use it, references it from the pilotswarm-npm-deployer agent so the deployer can offer it automatically at the end of a first successful VPN-enabled deploy, and refreshes the VPN client distribution sections in docs/deploying-to-aks.md and pilotswarm-new-env-deploy/SKILL.md to recommend the helper (also corrects the stale `vpn-client generate-url` reference to `vpn-client generate --authentication-method EAPTLS`). Tested live on chkrawvpn.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Pre-merge audit found three real drift items between the docs and the shipped behavior:

1. new-env.mjs interactive VPN prompt advertised ~$140/mo, but the cost banner in the same scaffolder run (and template.env, deploy.mjs, and every doc surface) is ~$450/mo total (~$280 VPN + ~$170 DNS Resolver). Fixed both occurrences in INPUTS.

2. new-env.mjs post-scaffold VPN reminder pointed at the stale 'az network vnet-gateway vpn-client generate-url' subcommand. The shipped helper script, docs/deploying-to-aks.md, and the pilotswarm-new-env-deploy / pilotswarm-vpn-client-profile skills all use the correct 'vpn-client generate --authentication-method EAPTLS' form. Updated the reminder to (a) recommend the new Get-VpnClientProfile.ps1 helper as the preferred path, (b) cross-reference the pilotswarm-vpn-client-profile skill, and (c) emit the correct az subcommand as the CLI fallback.

3. docs/deploying-to-aks.md "Optional: VPN Gateway P2S" used an ambiguous <resourceName>.<SSL_CERT_DOMAIN_SUFFIX> placeholder. The Private DNS A record is keyed on PORTAL_RESOURCE_NAME (e.g. pschkrawvpn-wus3-portal) to match the AppGw listener / AKV cert subject; RESOURCE_PREFIX alone is only the backwards-compat fallback. Clarified inline.

Also updated the new-env vpn-reminder test to assert the corrected subcommand and the helper-script reference. All 49 deploy/scripts/test/new-env.test.mjs tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Web-verified 2026-06-18: no Azure VPN Client exists on the iOS App Store (Windows/macOS/Android only). iOS users therefore cannot use AAD-interactive auth to Azure VPN Gateway at all — not a CA policy issue, just absent product. The only iOS-supported paths are native iOS VPN with IKEv2+cert auth or third-party OpenVPN clients with cert auth.

Added Phase 6 to the implementation table and a new `iOS support requires certificate-based authentication` section covering: dual-auth gateway pattern (bicep supports AAD + vpnClientRootCertificates simultaneously), required scripts (root cert generation, per-user client certs, .mobileconfig generator for iOS one-tap install), critical access-control implications (cert auth bypasses Entra entirely — short validity, per-device certs, revocation list mandatory), and backwards compatibility (additive only).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@ChrisKrawczyk ChrisKrawczyk merged commit 4a4ef5b into main Jun 19, 2026
1 check passed
@ChrisKrawczyk ChrisKrawczyk deleted the feature/vpn-p2s-ingress branch June 19, 2026 05:51
ChrisKrawczyk pushed a commit that referenced this pull request Jun 22, 2026
Brings in v0.2.0 release: VPN P2S ingress (#53), dual redirect URIs,
client-profile helper, gitignore .claude/, proposals doc revisions.

Conflict resolution (all 10 conflicts):
- package.json x3, packages/{cli,sdk}/package.json: took main's 0.2.0
  version bumps (our 0.1.36 bump was branch-local, never released).
- package.json (test:deploy-scripts): merged — kept our 3 test additions
  (build-call-sites, dockerfile-worker, setup-obo-smoke-worker-app) and
  picked up main's appgw-waf-rules.test.mjs.
- package-lock.json: took theirs + npm install reconcile.
- CHANGELOG.md: our entry restructured under '## Unreleased'; main's
  '## 0.2.0 — 2026-06-19' entry inserted below.
- deploy/envs/template.env, both base-infra bicep files, two skill
  markdowns: pure additive blocks (OBO vs VPN sections / params /
  bullets); both halves retained, ours-first ordering.

Validation:
- pilotswarm-sdk@0.2.0: tsc build clean.
- packages/sdk/test/local/plugin-tools-contract.test.js: 19/19 pass.
- npm run test:deploy-scripts: 281/281 pass (was 248 on main; +33
  from this branch's smoke/OBO test additions).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants