Skip to content

CORS-4072: dualstack support for AWS#10353

Open
tthvo wants to merge 5 commits intoopenshift:mainfrom
tthvo:CORS-4076
Open

CORS-4072: dualstack support for AWS#10353
tthvo wants to merge 5 commits intoopenshift:mainfrom
tthvo:CORS-4076

Conversation

@tthvo
Copy link
Member

@tthvo tthvo commented Mar 4, 2026

Description

This PR added dualstack (IPv4 + IPv6) networking support for OCP on AWS platform. In this configurations, network infrastructure (i.e. vpc, subnets, load balancers, etc) and pod/service internal networking will be dualstack.

There are 2 dualstack variants: dualstack IPv4 primary and dualstack IPv6 primary. See below for the install-config snippet to install each case.

This addresses: CORS-4074, CORS-4076, CORS-4077, CORS-4167, CORS-4168, and CORS-4243

Setup Instructions

Use the below install-config snippet to configure networking and AWS platform.

Note: machineNetwork does not contain IPv6 CIDR as it is unknown at install time (i.e. will be patched later when infra is ready). The cluster network and service network must be provided by the user with valid ULA IPv6 CIDRs.

Dualstack IPv4 Primary:

networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  - cidr: fd01::/48
    hostPrefix: 64
  machineNetwork:
  - cidr: 10.0.0.0/16
  networkType: OVNKubernetes
  serviceNetwork:
  - 172.30.0.0/16
  - fd02::/112
platform:
  aws:
    region: us-east-1
    ipFamily: DualStackIPv4Primary
featureSet: TechPreviewNoUpgrade

Dualstack IPv6 Primary:

networking:
  clusterNetwork:
  - cidr: fd01::/48
    hostPrefix: 64
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  machineNetwork:
  - cidr: 10.0.0.0/16
  networkType: OVNKubernetes
  serviceNetwork:
  - fd02::/112
  - 172.30.0.0/16
platform:
  aws:
    region: us-east-1
    ipFamily: DualStackIPv6Primary
featureSet: TechPreviewNoUpgrade

Summary by CodeRabbit

  • New Features

    • Optional AAAA (IPv6) DNS records and batching of multiple DNS record changes
    • IPv6/dual‑stack support across machines, VPCs, subnets, load balancers, and API target group IP type
    • Helpers to extract and filter machine CIDRs by IP family
  • Refactor

    • Centralized, chainable ignition editing with improved dual‑stack and custom‑DNS handling
    • IPv6-aware updates for security groups, SSH rules, and node‑port ingress rules
  • Tests

    • Added dual‑stack VPC and subnet test coverage

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 4, 2026
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 4, 2026

@tthvo: This pull request references CORS-4072 which is a valid jira issue.

Details

In response to this:

Description

This PR added dualstack (IPv4 + IPv6) networking support for OCP on AWS platform. In this configurations, network infrastructure (i.e. vpc, subnets, load balancers, etc) and pod/service internal networking will be dualstack.

There are 2 dualstack variants: dualstack IPv4 primary and dualstack IPv6 primary. See below for the install-config snippet to install each case.

This addresses: CORS-4074, CORS-4076, CORS-4077,CORS-4167,CORS-4168, and CORS-4243

Install-config Snippet

Use the below install-config snippet to configure networking and AWS platform.

Note: machineNetwork does not contain IPv6 CIDR as it is unknown at install time (i.e. will be patched later when infra is ready). The cluster network and service network must be provided by the user with valid ULA IPv6 CIDRs.

Dualstack IPv4 Primary:

networking:
 clusterNetwork:
 - cidr: 10.128.0.0/14
   hostPrefix: 23
 - cidr: fd01::/48
   hostPrefix: 64
 machineNetwork:
 - cidr: 10.0.0.0/16
 networkType: OVNKubernetes
 serviceNetwork:
 - 172.30.0.0/16
 - fd02::/112
platform:
 aws:
   region: us-east-1
   ipFamily: DualStackIPv4Primary
featureSet: TechPreviewNoUpgrade

Dualstack IPv6 Primary:

networking:
 clusterNetwork:
 - cidr: fd01::/48
   hostPrefix: 64
 - cidr: 10.128.0.0/14
   hostPrefix: 23
 machineNetwork:
 - cidr: 10.0.0.0/16
 networkType: OVNKubernetes
 serviceNetwork:
 - fd02::/112
 - 172.30.0.0/16
platform:
 aws:
   region: us-east-1
   ipFamily: DualStackIPv6Primary
featureSet: TechPreviewNoUpgrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@tthvo
Copy link
Member Author

tthvo commented Mar 4, 2026

/cc @sadasu @nrb @patrickdillon

@openshift-ci openshift-ci bot requested review from nrb, patrickdillon and sadasu March 4, 2026 02:18
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 4, 2026

@tthvo: This pull request references CORS-4072 which is a valid jira issue.

Details

In response to this:

Description

This PR added dualstack (IPv4 + IPv6) networking support for OCP on AWS platform. In this configurations, network infrastructure (i.e. vpc, subnets, load balancers, etc) and pod/service internal networking will be dualstack.

There are 2 dualstack variants: dualstack IPv4 primary and dualstack IPv6 primary. See below for the install-config snippet to install each case.

This addresses: CORS-4074, CORS-4076, CORS-4077, CORS-4167, CORS-4168, and CORS-4243

Install-config Snippet

Use the below install-config snippet to configure networking and AWS platform.

Note: machineNetwork does not contain IPv6 CIDR as it is unknown at install time (i.e. will be patched later when infra is ready). The cluster network and service network must be provided by the user with valid ULA IPv6 CIDRs.

Dualstack IPv4 Primary:

networking:
 clusterNetwork:
 - cidr: 10.128.0.0/14
   hostPrefix: 23
 - cidr: fd01::/48
   hostPrefix: 64
 machineNetwork:
 - cidr: 10.0.0.0/16
 networkType: OVNKubernetes
 serviceNetwork:
 - 172.30.0.0/16
 - fd02::/112
platform:
 aws:
   region: us-east-1
   ipFamily: DualStackIPv4Primary
featureSet: TechPreviewNoUpgrade

Dualstack IPv6 Primary:

networking:
 clusterNetwork:
 - cidr: fd01::/48
   hostPrefix: 64
 - cidr: 10.128.0.0/14
   hostPrefix: 23
 machineNetwork:
 - cidr: 10.0.0.0/16
 networkType: OVNKubernetes
 serviceNetwork:
 - fd02::/112
 - 172.30.0.0/16
platform:
 aws:
   region: us-east-1
   ipFamily: DualStackIPv6Primary
featureSet: TechPreviewNoUpgrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 4, 2026

@tthvo: This pull request references CORS-4072 which is a valid jira issue.

Details

In response to this:

Description

This PR added dualstack (IPv4 + IPv6) networking support for OCP on AWS platform. In this configurations, network infrastructure (i.e. vpc, subnets, load balancers, etc) and pod/service internal networking will be dualstack.

There are 2 dualstack variants: dualstack IPv4 primary and dualstack IPv6 primary. See below for the install-config snippet to install each case.

This addresses: CORS-4074, CORS-4076, CORS-4077, CORS-4167, CORS-4168, and CORS-4243

Setup Instructions

Use the below install-config snippet to configure networking and AWS platform.

Note: machineNetwork does not contain IPv6 CIDR as it is unknown at install time (i.e. will be patched later when infra is ready). The cluster network and service network must be provided by the user with valid ULA IPv6 CIDRs.

Dualstack IPv4 Primary:

networking:
 clusterNetwork:
 - cidr: 10.128.0.0/14
   hostPrefix: 23
 - cidr: fd01::/48
   hostPrefix: 64
 machineNetwork:
 - cidr: 10.0.0.0/16
 networkType: OVNKubernetes
 serviceNetwork:
 - 172.30.0.0/16
 - fd02::/112
platform:
 aws:
   region: us-east-1
   ipFamily: DualStackIPv4Primary
featureSet: TechPreviewNoUpgrade

Dualstack IPv6 Primary:

networking:
 clusterNetwork:
 - cidr: fd01::/48
   hostPrefix: 64
 - cidr: 10.128.0.0/14
   hostPrefix: 23
 machineNetwork:
 - cidr: 10.0.0.0/16
 networkType: OVNKubernetes
 serviceNetwork:
 - fd02::/112
 - 172.30.0.0/16
platform:
 aws:
   region: us-east-1
   ipFamily: DualStackIPv6Primary
featureSet: TechPreviewNoUpgrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@tthvo
Copy link
Member Author

tthvo commented Mar 4, 2026

/testwith openshift/installer/main/e2e-aws-ovn-dualstack-ipv4-primary-techpreview openshift/machine-api-provider-aws#177 openshift/machine-api-provider-aws#171

@tthvo
Copy link
Member Author

tthvo commented Mar 4, 2026

/testwith openshift/installer/main/e2e-aws-ovn-dualstack-ipv6-primary-techpreview openshift/machine-api-provider-aws#177 openshift/machine-api-provider-aws#171

@tthvo
Copy link
Member Author

tthvo commented Mar 4, 2026

/payload-job periodic-ci-openshift-openshift-tests-private-release-4.22-amd64-nightly-aws-usgov-ipi-private-ep-fips-f7

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 4, 2026

@tthvo: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-openshift-tests-private-release-4.22-amd64-nightly-aws-usgov-ipi-private-ep-fips-f7

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/70bcb810-1773-11f1-831b-7034f4f09cdc-0

@tthvo
Copy link
Member Author

tthvo commented Mar 4, 2026

/label platform/aws

@tthvo
Copy link
Member Author

tthvo commented Mar 5, 2026

/test golint

@tthvo
Copy link
Member Author

tthvo commented Mar 5, 2026

/testwith openshift/installer/main/e2e-aws-ovn-dualstack-ipv4-primary-techpreview openshift/machine-api-provider-aws#177

@tthvo
Copy link
Member Author

tthvo commented Mar 5, 2026

/testwith openshift/installer/main/e2e-aws-ovn-dualstack-ipv6-primary-techpreview openshift/machine-api-provider-aws#177

Copy link
Contributor

@patrickdillon patrickdillon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, but I had a few questions, particularly about why we need the node port changes.

The code changes for that and SSH look good, but it's unfortunate we need to do that. I suppose, particularly for SSH, it does not make sense to use Security Group Roles? i.e. we instruct users to add the role to the installer host

I'm less sure about node port... if we do need it, seems like Security Group Role might be ok there?

@patrickdillon
Copy link
Contributor

patrickdillon commented Mar 5, 2026

Also, I don't really get how install is copmleting in the test with MAPI PR: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/multi-pr-openshift-installer-10353-openshift-machine-api-provider-aws-177-e2e-aws-ovn-dualstack-ipv4-primary-techpreview/2029358234887786496

I thought we were blocked by ingress on this. I know previously you were using the node port hack to get around the ingress issue; and there's still nodeport changes in this pr, but I thought those were just security group rules. Seems like I'm missing something

@tthvo
Copy link
Member Author

tthvo commented Mar 6, 2026

I'm less sure about node port... if we do need it, seems like Security Group Role might be ok there?

The security group already have rules to allow node ports (30000-32767) to node and controlplane role because we define it in CNI rules. This means traffic is allowed between nodes.

However, we also need to allow traffic from the load balancers created by CCM for LoadBalancer type Services, for example, default ingress router. These load balancers do not have "role", so all we can do is to allow traffic from VPC/subnet CIDRs (or all IPs)...

I suppose, particularly for SSH, it does not make sense to use Security Group Roles? i.e. we instruct users to add the role to the installer host

Right, unfortunately, the roles are specifically for cluster components (i.e. nodes, load balancers) so we can't really add one to installer host. Also, a role is corresponding a security group, which means the installer host may be allowed to access ports (for example, 30000-32767) more than intended.

@tthvo
Copy link
Member Author

tthvo commented Mar 6, 2026

I thought we were blocked by ingress on this. I know previously you were using the node port hack to get around the ingress issue; and there's still nodeport changes in this pr, but I thought those were just security group rules. Seems like I'm missing something

Right, we are still blocked by ingress and CCM as of today. However, in dualstack IPv4 primary, the following happens:

  • The openshift-ingress/router-default Service is created as ipFamily: IPv4 (by default) due to no dualstack support yet from ingress and CCM. There is only an IPv4 ClusterIP address assigned to the Service.
  • The ingress Load Balancer is created as NLB and is IPv4 ip type.
  • There is an A record for the ingress LB in route53 hosted zone.

This means IPv4 traffic will flow to routers just fine even though IPv6 is not supported. To the cluster operators, the setup is the same as a regular singlestack IPv4 since they only care if the router is reachable (regardless of what IP family). Thus, the install completed.

For dualstack IPv6 primary, the openshift-ingress/router-default Service has ipFamily: IPv6 by default and the NLB is IPv4. Because there is no conversion from IPv4 LB traffic to IPv6 clusterIP, the routers are unreachable, causing the install to fail.

That being said, it's a "false positive" that the install succeeded :D

@patrickdillon
Copy link
Contributor

However, we also need to allow traffic from the load balancers created by CCM for LoadBalancer type Services, for example, default ingress router. These load balancers do not have "role", so all we can do is to allow traffic from VPC/subnet CIDRs (or all IPs)...

I see! Thinking about this more, as I mentioned above, this should actually be handled by the ingress component... Technically we should not have to deal with this.

In the case that we actually do need to do something like this, I do wonder if we could pre-provision, or use a determnistic name for a security group, so that ingress creates the LB in a security group we have pre-created rules for.

@tthvo
Copy link
Member Author

tthvo commented Mar 6, 2026

In the case that we actually do need to do something like this, I do wonder if we could pre-provision, or use a determnistic name for a security group, so that ingress creates the LB in a security group we have pre-created rules for.

Ahh yess, the installer can pre-create an additional security group with only allowed node port ingress rule (30000-32767) and attach it to cluster machines, similar to AdditionalSecurityGroupIDs feature.

The only issue is that 30000-32767 is still open to all IPs if NodePortIngressRuleCidrBlocks is left unset. From #10353 (comment), I suppose this behaviour won't go away soon as it's needed for other supported scenarios that other non-ocp projects/products are doing.

@patrickdillon
Copy link
Contributor

patrickdillon commented Mar 9, 2026

Regarding ingress services/nodeport, I realize now I was assuming that aws would have the same behavior as cloud-provider-gcp, but it does not. So sorry for confusing things. In gcp, the cloud provider will handle opening all the ports for an ingress service, but that does not happen in gcp aws.

@patrickdillon
Copy link
Contributor

The only issue is that 30000-32767 is still open to all IPs if NodePortIngressRuleCidrBlocks is left unset. From #10353 (comment), I suppose this behaviour won't go away soon as it's needed for other supported scenarios that other non-ocp projects/products are doing.

Right, I was thinking we would leave NodePortIngressRuleCidrBlocks in place for ipv4, and for ipv6 we could use the security group. The only advantage to my suggestion is that we don't have to jump through hoops during provisioning to determine the cidr and update the manifests on the fly.

That said, the current approach looks good and ingress isn't even sorted out, so maybe that is just something we want to consider once ingress is in place, as it might help simplify our code a bit.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 10, 2026

@tthvo: This pull request references CORS-4072 which is a valid jira issue.

Details

In response to this:

Description

This PR added dualstack (IPv4 + IPv6) networking support for OCP on AWS platform. In this configurations, network infrastructure (i.e. vpc, subnets, load balancers, etc) and pod/service internal networking will be dualstack.

There are 2 dualstack variants: dualstack IPv4 primary and dualstack IPv6 primary. See below for the install-config snippet to install each case.

This addresses: CORS-4074, CORS-4076, CORS-4077, CORS-4167, CORS-4168, and CORS-4243

Setup Instructions

Use the below install-config snippet to configure networking and AWS platform.

Note: machineNetwork does not contain IPv6 CIDR as it is unknown at install time (i.e. will be patched later when infra is ready). The cluster network and service network must be provided by the user with valid ULA IPv6 CIDRs.

Dualstack IPv4 Primary:

networking:
 clusterNetwork:
 - cidr: 10.128.0.0/14
   hostPrefix: 23
 - cidr: fd01::/48
   hostPrefix: 64
 machineNetwork:
 - cidr: 10.0.0.0/16
 networkType: OVNKubernetes
 serviceNetwork:
 - 172.30.0.0/16
 - fd02::/112
platform:
 aws:
   region: us-east-1
   ipFamily: DualStackIPv4Primary
featureSet: TechPreviewNoUpgrade

Dualstack IPv6 Primary:

networking:
 clusterNetwork:
 - cidr: fd01::/48
   hostPrefix: 64
 - cidr: 10.128.0.0/14
   hostPrefix: 23
 machineNetwork:
 - cidr: 10.0.0.0/16
 networkType: OVNKubernetes
 serviceNetwork:
 - fd02::/112
 - 172.30.0.0/16
platform:
 aws:
   region: us-east-1
   ipFamily: DualStackIPv6Primary
featureSet: TechPreviewNoUpgrade

Summary by CodeRabbit

  • New Features

  • Optional AAAA (IPv6) DNS records and batching of multiple DNS record changes

  • IPv6/dual‑stack support for machines, VPCs, subnets, load balancers, and API target group IP type

  • Helpers to extract and filter machine CIDRs by IP family

  • Refactor

  • Centralized ignition editing with chained edit steps and improved dual‑stack/custom DNS handling

  • IPv6-aware updates for security groups, SSH rules, and node‑port ingress rules

  • Tests

  • Added dual‑stack VPC and subnet test coverage

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/infrastructure/aws/clusterapi/aws.go (1)

564-603: ⚠️ Potential issue | 🟠 Major

isSSHRuleGone may report success prematurely for private installs.

The function checks for hardcoded 0.0.0.0/0 and ::/0 CIDRs, but removeSSHRule identifies bootstrap SSH rules by their description (BootstrapSSHDescription). For private installs, bootstrap SSH rules use VPC/subnet CIDRs (from sshRuleCidrs), not 0.0.0.0/0. This means isSSHRuleGone will return true before the actual bootstrap SSH rules are deleted.

Consider matching by BootstrapSSHDescription (consistent with removeSSHRule) instead of hardcoded CIDRs:

🛠️ Proposed fix to match rules by description
 func isSSHRuleGone(ctx context.Context, client *ec2.Client, sgID string) (bool, error) {
 	// ... existing SG fetch code ...

 	sg := sgs[0]
 	for _, rule := range sg.IpPermissions {
 		if ptr.Deref(rule.ToPort, 0) != 22 {
 			continue
 		}
-		// Check IPv4 rules
+		// Check IPv4 rules for bootstrap SSH by description
 		for _, source := range rule.IpRanges {
-			if source.CidrIp != nil && *source.CidrIp == "0.0.0.0/0" {
-				ruleDesc := ptr.Deref(source.Description, "[no description]")
-				logrus.Debugf("Found ingress rule %s with source cidr %s. Still waiting for deletion...", ruleDesc, *source.CidrIp)
+			if strings.EqualFold(ptr.Deref(source.Description, ""), awsmanifest.BootstrapSSHDescription) {
+				logrus.Debugf("Found bootstrap SSH ingress rule with IPv4 CIDR %s. Still waiting for deletion...", ptr.Deref(source.CidrIp, ""))
 				return false, nil
 			}
 		}
-		// Check IPv6 rules
+		// Check IPv6 rules for bootstrap SSH by description
 		for _, source := range rule.Ipv6Ranges {
-			if source.CidrIpv6 != nil && *source.CidrIpv6 == "::/0" {
-				ruleDesc := ptr.Deref(source.Description, "[no description]")
-				logrus.Debugf("Found ingress rule %s with source cidr %s. Still waiting for deletion...", ruleDesc, *source.CidrIpv6)
+			if strings.EqualFold(ptr.Deref(source.Description, ""), awsmanifest.BootstrapSSHDescription) {
+				logrus.Debugf("Found bootstrap SSH ingress rule with IPv6 CIDR %s. Still waiting for deletion...", ptr.Deref(source.CidrIpv6, ""))
 				return false, nil
 			}
 		}
 	}

 	return true, nil
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/infrastructure/aws/clusterapi/aws.go` around lines 564 - 603,
isSSHRuleGone currently looks for hardcoded CIDRs (0.0.0.0/0 and ::/0) which
misses bootstrap SSH rules that are identified by description in removeSSHRule;
update isSSHRuleGone to detect any SSH ingress rules (ToPort == 22) that have
the BootstrapSSHDescription (use the same BootstrapSSHDescription symbol) in
their Description fields for both IpRanges and Ipv6Ranges (and any other
permission fields you use for bootstrap rules), and treat presence of those
described rules as "not gone" (return false), otherwise return true; keep the
existing error handling and logging (use ptr.Deref on Description like existing
code) so behavior matches removeSSHRule's identification logic.
♻️ Duplicate comments (1)
pkg/infrastructure/clusterapi/ignition.go (1)

328-378: ⚠️ Potential issue | 🟠 Major

Silent success when clusterConfigDataFile is not found may mask issues.

If clusterConfigDataFile ("/opt/openshift/manifests/cluster-config.yaml") is not present in config.Storage.Files, the function returns nil without an error. Given that updateMachineNetworks is called specifically to update machine networks for dual-stack, a missing cluster-config file is likely an unexpected state that should fail the install rather than silently succeed.

🛠️ Proposed fix to fail when cluster-config is not found
 func updateMachineNetworks(in IgnitionInput, config *igntypes.Config, machineNetworks []types.MachineNetworkEntry) error {
+	found := false
 	for i, fileData := range config.Storage.Files {
 		if fileData.Path != clusterConfigDataFile {
 			continue
 		}
+		found = true

 		contents := strings.Split(*config.Storage.Files[i].Contents.Source, ",")
 		// ... existing logic ...

 		encoded := fmt.Sprintf("%s%s", header, base64.StdEncoding.EncodeToString(configCMContents))
 		// replace the contents with the edited information
 		config.Storage.Files[i].Contents.Source = &encoded

-		break
+		return nil
 	}
-	return nil
+	if !found {
+		return fmt.Errorf("failed to find %s in bootstrap ignition", clusterConfigDataFile)
+	}
+	return nil
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/infrastructure/clusterapi/ignition.go` around lines 328 - 378, The
function updateMachineNetworks currently iterates config.Storage.Files for
clusterConfigDataFile and silently returns nil if not found; change it to return
an explicit error when the loop finishes without locating clusterConfigDataFile
so the caller knows the expected cluster-config manifest was missing. Locate
updateMachineNetworks and the loop that checks fileData.Path !=
clusterConfigDataFile, and after the loop (or by tracking a found boolean)
return a descriptive fmt.Errorf mentioning clusterConfigDataFile when it was not
found instead of nil; keep all existing decoding/marshalling behavior unchanged
when the file is present.
🧹 Nitpick comments (1)
pkg/asset/manifests/capiutils/helpers.go (1)

57-77: Consider renaming loop variable to avoid shadowing the ipnet package.

The loop variable ipnet shadows the imported package name. While this works correctly here (the package isn't accessed inside the loop), it can cause confusion during maintenance.

♻️ Suggested refactor
 // GetIPv4CIDRs returns only IPNets of IPv4 family.
 func GetIPv4CIDRs(cidrs []ipnet.IPNet) []ipnet.IPNet {
 	var ipv4Nets []ipnet.IPNet
-	for _, ipnet := range cidrs {
-		if netutils.IsIPv4CIDR(&ipnet.IPNet) {
-			ipv4Nets = append(ipv4Nets, ipnet)
+	for _, cidr := range cidrs {
+		if netutils.IsIPv4CIDR(&cidr.IPNet) {
+			ipv4Nets = append(ipv4Nets, cidr)
 		}
 	}
 	return ipv4Nets
 }

 // GetIPv6CIDRs returns only IPNets of IPv6 family.
 func GetIPv6CIDRs(cidrs []ipnet.IPNet) []ipnet.IPNet {
 	var ipv6Nets []ipnet.IPNet
-	for _, ipnet := range cidrs {
-		if netutils.IsIPv6CIDR(&ipnet.IPNet) {
-			ipv6Nets = append(ipv6Nets, ipnet)
+	for _, cidr := range cidrs {
+		if netutils.IsIPv6CIDR(&cidr.IPNet) {
+			ipv6Nets = append(ipv6Nets, cidr)
 		}
 	}
 	return ipv6Nets
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/asset/manifests/capiutils/helpers.go` around lines 57 - 77, The loop
variable name `ipnet` shadows the imported package `ipnet`, causing confusion;
in both GetIPv4CIDRs and GetIPv6CIDRs, rename the loop variable (e.g., use `n`,
`netItem`, or `cidr`) and update references inside the loop (e.g.,
netutils.IsIPv4CIDR(&netItem.IPNet) and ipv4Nets = append(ipv4Nets, netItem)) so
the code no longer shadows the package while preserving use of the ipnet.IPNet
type and behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@pkg/infrastructure/aws/clusterapi/aws.go`:
- Around line 564-603: isSSHRuleGone currently looks for hardcoded CIDRs
(0.0.0.0/0 and ::/0) which misses bootstrap SSH rules that are identified by
description in removeSSHRule; update isSSHRuleGone to detect any SSH ingress
rules (ToPort == 22) that have the BootstrapSSHDescription (use the same
BootstrapSSHDescription symbol) in their Description fields for both IpRanges
and Ipv6Ranges (and any other permission fields you use for bootstrap rules),
and treat presence of those described rules as "not gone" (return false),
otherwise return true; keep the existing error handling and logging (use
ptr.Deref on Description like existing code) so behavior matches removeSSHRule's
identification logic.

---

Duplicate comments:
In `@pkg/infrastructure/clusterapi/ignition.go`:
- Around line 328-378: The function updateMachineNetworks currently iterates
config.Storage.Files for clusterConfigDataFile and silently returns nil if not
found; change it to return an explicit error when the loop finishes without
locating clusterConfigDataFile so the caller knows the expected cluster-config
manifest was missing. Locate updateMachineNetworks and the loop that checks
fileData.Path != clusterConfigDataFile, and after the loop (or by tracking a
found boolean) return a descriptive fmt.Errorf mentioning clusterConfigDataFile
when it was not found instead of nil; keep all existing decoding/marshalling
behavior unchanged when the file is present.

---

Nitpick comments:
In `@pkg/asset/manifests/capiutils/helpers.go`:
- Around line 57-77: The loop variable name `ipnet` shadows the imported package
`ipnet`, causing confusion; in both GetIPv4CIDRs and GetIPv6CIDRs, rename the
loop variable (e.g., use `n`, `netItem`, or `cidr`) and update references inside
the loop (e.g., netutils.IsIPv4CIDR(&netItem.IPNet) and ipv4Nets =
append(ipv4Nets, netItem)) so the code no longer shadows the package while
preserving use of the ipnet.IPNet type and behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: bd03415a-daaa-4907-8723-6655733350bb

📥 Commits

Reviewing files that changed from the base of the PR and between e47210e and ba7f608.

📒 Files selected for processing (8)
  • pkg/asset/manifests/aws/cluster.go
  • pkg/asset/manifests/capiutils/helpers.go
  • pkg/infrastructure/aws/clusterapi/aws.go
  • pkg/infrastructure/aws/clusterapi/ignition.go
  • pkg/infrastructure/azure/ignition.go
  • pkg/infrastructure/clusterapi/ignition.go
  • pkg/infrastructure/clusterapi/types.go
  • pkg/infrastructure/gcp/clusterapi/ignition.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/infrastructure/azure/ignition.go

@tthvo
Copy link
Member Author

tthvo commented Mar 10, 2026

/testwith openshift/installer/main/e2e-aws-ovn-dualstack-ipv6-primary-techpreview openshift/machine-api-provider-aws#177

/testwith openshift/installer/main/e2e-aws-ovn-dualstack-ipv4-primary-techpreview openshift/machine-api-provider-aws#177

@tthvo
Copy link
Member Author

tthvo commented Mar 10, 2026

Phew, I think all CodeRabbit reviews are now addressed and the PR is rebased on top of latest main. PTAL 🙏

In case anyone misses due to the long bot review, the latest comments/discussions were posted here: #10353 (comment)

@tthvo
Copy link
Member Author

tthvo commented Mar 10, 2026

/test e2e-aws-ovn e2e-aws-ovn-edge-zones e2e-aws-ovn-shared-vpc-edge-zones

1 similar comment
@tthvo
Copy link
Member Author

tthvo commented Mar 10, 2026

/test e2e-aws-ovn e2e-aws-ovn-edge-zones e2e-aws-ovn-shared-vpc-edge-zones

tthvo added 5 commits March 10, 2026 11:24
Note: Defining a non-nil (but empty) vpc.ipv6 block will configure cAPA
to provision the network infrastructure with IPv6 capabilities.
The machines need configuring to:
- Use resource-name hostname type in order to have A and AAAA records
  for its hostname.
- Assign a primary IPv6 address in order to register with IPv6 target
  group when dualstack is IPv6 primary.
- Enable IPv6 IMDS HTTP endpoint.
The VPC IPv6 CIDR is not available at install time (when the installer
provisions the VPC). Thus, we need to add it in the node port rules
later on and ensure the rules are available.
The etcd operator requires IPv6 machine networks in dualstack
networking. Since we may not know the VPC IPV6 at install time, we need
to add it in later on once the infrastructure is ready.
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 10, 2026

@tthvo: This pull request references CORS-4072 which is a valid jira issue.

Details

In response to this:

Description

This PR added dualstack (IPv4 + IPv6) networking support for OCP on AWS platform. In this configurations, network infrastructure (i.e. vpc, subnets, load balancers, etc) and pod/service internal networking will be dualstack.

There are 2 dualstack variants: dualstack IPv4 primary and dualstack IPv6 primary. See below for the install-config snippet to install each case.

This addresses: CORS-4074, CORS-4076, CORS-4077, CORS-4167, CORS-4168, and CORS-4243

Setup Instructions

Use the below install-config snippet to configure networking and AWS platform.

Note: machineNetwork does not contain IPv6 CIDR as it is unknown at install time (i.e. will be patched later when infra is ready). The cluster network and service network must be provided by the user with valid ULA IPv6 CIDRs.

Dualstack IPv4 Primary:

networking:
 clusterNetwork:
 - cidr: 10.128.0.0/14
   hostPrefix: 23
 - cidr: fd01::/48
   hostPrefix: 64
 machineNetwork:
 - cidr: 10.0.0.0/16
 networkType: OVNKubernetes
 serviceNetwork:
 - 172.30.0.0/16
 - fd02::/112
platform:
 aws:
   region: us-east-1
   ipFamily: DualStackIPv4Primary
featureSet: TechPreviewNoUpgrade

Dualstack IPv6 Primary:

networking:
 clusterNetwork:
 - cidr: fd01::/48
   hostPrefix: 64
 - cidr: 10.128.0.0/14
   hostPrefix: 23
 machineNetwork:
 - cidr: 10.0.0.0/16
 networkType: OVNKubernetes
 serviceNetwork:
 - fd02::/112
 - 172.30.0.0/16
platform:
 aws:
   region: us-east-1
   ipFamily: DualStackIPv6Primary
featureSet: TechPreviewNoUpgrade

Summary by CodeRabbit

  • New Features

  • Optional AAAA (IPv6) DNS records and batching of multiple DNS record changes

  • IPv6/dual‑stack support across machines, VPCs, subnets, load balancers, and API target group IP type

  • Helpers to extract and filter machine CIDRs by IP family

  • Refactor

  • Centralized, chainable ignition editing with improved dual‑stack and custom‑DNS handling

  • IPv6-aware updates for security groups, SSH rules, and node‑port ingress rules

  • Tests

  • Added dual‑stack VPC and subnet test coverage

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
pkg/infrastructure/aws/clusterapi/aws.go (1)

584-599: ⚠️ Potential issue | 🟠 Major

Match bootstrap SSH rules by description, not by 0.0.0.0/0 / ::/0.

removeSSHRule removes entries by awsmanifest.BootstrapSSHDescription, but this verifier only waits on world-open CIDRs. In private or BYO-subnet installs the bootstrap SSH rule can be scoped to cluster/VPC CIDRs instead, so this can return true before the bootstrap SSH rule has actually been removed. Check both IpRanges and Ipv6Ranges by description to keep the verifier aligned with the removal logic.

Proposed fix
 	for _, rule := range sg.IpPermissions {
 		if ptr.Deref(rule.ToPort, 0) != 22 {
 			continue
 		}
 		// Check IPv4 rules
 		for _, source := range rule.IpRanges {
-			if source.CidrIp != nil && *source.CidrIp == "0.0.0.0/0" {
-				ruleDesc := ptr.Deref(source.Description, "[no description]")
-				logrus.Debugf("Found ingress rule %s with source cidr %s. Still waiting for deletion...", ruleDesc, *source.CidrIp)
+			if strings.EqualFold(ptr.Deref(source.Description, ""), awsmanifest.BootstrapSSHDescription) {
+				logrus.Debugf("Found bootstrap SSH ingress rule with source cidr %s. Still waiting for deletion...", ptr.Deref(source.CidrIp, ""))
 				return false, nil
 			}
 		}
 		// Check IPv6 rules
 		for _, source := range rule.Ipv6Ranges {
-			if source.CidrIpv6 != nil && *source.CidrIpv6 == "::/0" {
-				ruleDesc := ptr.Deref(source.Description, "[no description]")
-				logrus.Debugf("Found ingress rule %s with source cidr %s. Still waiting for deletion...", ruleDesc, *source.CidrIpv6)
+			if strings.EqualFold(ptr.Deref(source.Description, ""), awsmanifest.BootstrapSSHDescription) {
+				logrus.Debugf("Found bootstrap SSH ingress rule with source cidr %s. Still waiting for deletion...", ptr.Deref(source.CidrIpv6, ""))
 				return false, nil
 			}
 		}
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/infrastructure/aws/clusterapi/aws.go` around lines 584 - 599, The
verifier currently only checks for world-open CIDRs when scanning rule.IpRanges
and rule.Ipv6Ranges, which can miss bootstrap SSH rules that are scoped to
non-world CIDRs; update the checks in the loops that iterate rule.IpRanges and
rule.Ipv6Ranges to look for source.Description matching
awsmanifest.BootstrapSSHDescription (use ptr.Deref or nil-safe compare) and
return false if such a description is found (so the verifier aligns with
removeSSHRule which deletes by awsmanifest.BootstrapSSHDescription); preserve or
remove the CIDR checks as desired, but ensure the description-based check is
present for both IpRanges and Ipv6Ranges.
🧹 Nitpick comments (1)
pkg/asset/manifests/capiutils/helpers.go (1)

58-76: Minor: Variable shadows imported package name.

The loop variable ipnet shadows the imported github.com/openshift/installer/pkg/ipnet package. While this doesn't cause functional issues since the package isn't used inside the loop body, renaming to cidr would improve clarity.

♻️ Suggested improvement
 func GetIPv4CIDRs(cidrs []ipnet.IPNet) []ipnet.IPNet {
 	var ipv4Nets []ipnet.IPNet
-	for _, ipnet := range cidrs {
-		if netutils.IsIPv4CIDR(&ipnet.IPNet) {
-			ipv4Nets = append(ipv4Nets, ipnet)
+	for _, cidr := range cidrs {
+		if netutils.IsIPv4CIDR(&cidr.IPNet) {
+			ipv4Nets = append(ipv4Nets, cidr)
 		}
 	}
 	return ipv4Nets
 }

 func GetIPv6CIDRs(cidrs []ipnet.IPNet) []ipnet.IPNet {
 	var ipv6Nets []ipnet.IPNet
-	for _, ipnet := range cidrs {
-		if netutils.IsIPv6CIDR(&ipnet.IPNet) {
-			ipv6Nets = append(ipv6Nets, ipnet)
+	for _, cidr := range cidrs {
+		if netutils.IsIPv6CIDR(&cidr.IPNet) {
+			ipv6Nets = append(ipv6Nets, cidr)
 		}
 	}
 	return ipv6Nets
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/asset/manifests/capiutils/helpers.go` around lines 58 - 76, The loop
variable name `ipnet` in GetIPv4CIDRs and GetIPv6CIDRs shadows the imported
ipnet package; rename the loop variable (e.g., to `cidr`) in both functions and
update all uses inside the loops (references to `ipnet.IPNet` -> `cidr.IPNet`)
to avoid shadowing and improve clarity.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/infrastructure/clusterapi/ignition.go`:
- Around line 328-377: The updateMachineNetworks function currently returns nil
when no file with path clusterConfigDataFile is found in config.Storage.Files,
causing silent success; modify updateMachineNetworks to detect when the loop
does not find clusterConfigDataFile (e.g., set a found flag or return after
handling) and return a descriptive error if the file is missing so callers know
the ignition lacked the expected cluster-config file; reference the loop over
config.Storage.Files, the clusterConfigDataFile constant and the break after
updating config.Storage.Files[i].Contents.Source to locate where to add this
check.

---

Duplicate comments:
In `@pkg/infrastructure/aws/clusterapi/aws.go`:
- Around line 584-599: The verifier currently only checks for world-open CIDRs
when scanning rule.IpRanges and rule.Ipv6Ranges, which can miss bootstrap SSH
rules that are scoped to non-world CIDRs; update the checks in the loops that
iterate rule.IpRanges and rule.Ipv6Ranges to look for source.Description
matching awsmanifest.BootstrapSSHDescription (use ptr.Deref or nil-safe compare)
and return false if such a description is found (so the verifier aligns with
removeSSHRule which deletes by awsmanifest.BootstrapSSHDescription); preserve or
remove the CIDR checks as desired, but ensure the description-based check is
present for both IpRanges and Ipv6Ranges.

---

Nitpick comments:
In `@pkg/asset/manifests/capiutils/helpers.go`:
- Around line 58-76: The loop variable name `ipnet` in GetIPv4CIDRs and
GetIPv6CIDRs shadows the imported ipnet package; rename the loop variable (e.g.,
to `cidr`) in both functions and update all uses inside the loops (references to
`ipnet.IPNet` -> `cidr.IPNet`) to avoid shadowing and improve clarity.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 31d01973-45b0-409a-b123-1fcf7db806c4

📥 Commits

Reviewing files that changed from the base of the PR and between ba7f608 and 52243fa.

📒 Files selected for processing (13)
  • pkg/asset/installconfig/aws/route53.go
  • pkg/asset/machines/aws/awsmachines.go
  • pkg/asset/machines/clusterapi.go
  • pkg/asset/manifests/aws/cluster.go
  • pkg/asset/manifests/aws/zones.go
  • pkg/asset/manifests/aws/zones_test.go
  • pkg/asset/manifests/capiutils/helpers.go
  • pkg/infrastructure/aws/clusterapi/aws.go
  • pkg/infrastructure/aws/clusterapi/ignition.go
  • pkg/infrastructure/azure/ignition.go
  • pkg/infrastructure/clusterapi/ignition.go
  • pkg/infrastructure/clusterapi/types.go
  • pkg/infrastructure/gcp/clusterapi/ignition.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • pkg/infrastructure/azure/ignition.go
  • pkg/asset/machines/clusterapi.go

@tthvo
Copy link
Member Author

tthvo commented Mar 10, 2026

/testwith openshift/installer/main/e2e-aws-ovn-dualstack-ipv6-primary-techpreview openshift/machine-api-provider-aws#177

/testwith openshift/installer/main/e2e-aws-ovn-dualstack-ipv4-primary-techpreview openshift/machine-api-provider-aws#177

@tthvo
Copy link
Member Author

tthvo commented Mar 10, 2026

/skip

@tthvo
Copy link
Member Author

tthvo commented Mar 10, 2026

/retest-required

@tthvo
Copy link
Member Author

tthvo commented Mar 10, 2026

/test e2e-aws-ovn-shared-vpc-edge-zones e2e-aws-byo-subnet-role-security-groups e2e-aws-ovn-edge-zones

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 11, 2026

@tthvo: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-custom-dns-techpreview 82cff60 link false /test e2e-aws-custom-dns-techpreview
ci/prow/e2e-aws-ovn-fips 52243fa link false /test e2e-aws-ovn-fips
ci/prow/e2e-aws-ovn-heterogeneous 52243fa link false /test e2e-aws-ovn-heterogeneous
ci/prow/e2e-gcp-xpn-custom-dns 52243fa link false /test e2e-gcp-xpn-custom-dns
ci/prow/e2e-azurestack 52243fa link false /test e2e-azurestack
ci/prow/gcp-custom-endpoints-proxy-wif 52243fa link false /test gcp-custom-endpoints-proxy-wif
ci/prow/e2e-gcp-custom-dns 52243fa link false /test e2e-gcp-custom-dns
ci/prow/e2e-aws-ovn-shared-vpc-edge-zones 52243fa link false /test e2e-aws-ovn-shared-vpc-edge-zones

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@patrickdillon
Copy link
Contributor

/lgtm

@patrickdillon
Copy link
Contributor

/verified by e2e-aws-ovn-dualstack (see multi-pr tests)

@openshift-ci-robot
Copy link
Contributor

@patrickdillon: This PR has been marked as verified by e2e-aws-ovn-dualstack (see multi-pr tests).

Details

In response to this:

/verified by e2e-aws-ovn-dualstack (see multi-pr tests)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Mar 13, 2026
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 13, 2026
@tthvo
Copy link
Member Author

tthvo commented Mar 13, 2026

/tide refresh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. platform/aws verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants