Add flex CIDR allocator controller by AkarshES · Pull Request #535 · oracle/oci-cloud-controller-manager

AkarshES · 2026-04-17T12:57:34Z

For using external CNIs in direct routing mode we need to provide a way for IPs to be attached to the worker node. This change introduces a new controller which looks at the instance metadata for a well known field and processes the allocation of IPs on the worker node.

HadrienPatte · 2026-04-17T15:25:46Z

+func initOCIRetry() {
+	initRetryOnce.Do(func() {
+		p := common.DefaultRetryPolicy()
+		common.GlobalRetry = &p
+	})
+}
+
+func init() { initOCIRetry() }


common.GlobalRetry is the OCI SDK's process-wide default. Importing pkg/flexcidr anywhere (tests, tools, CCM main) silently changes retry behavior for CSI, CCM LB, tagging, every OCI call in the binary. This feels like unintended behavior.

If per-operation retry is wanted, passing a RequestMetadata{RetryPolicy: &p} explicitly on the specific requests would be an appropriate alternative.

HadrienPatte · 2026-04-17T15:27:16Z

+
+func init() { initOCIRetry() }
+
+func runWithRateLimitRetry(ctx context.Context, logger *zap.SugaredLogger, operation string, fn func(context.Context) error) error {


This function is currently only called in tests but not used otherwise. It should probably either be deleted or used in CreatePrivateIpWithRequest/CreateIpv6WithRequest which is where rate limits actually apply.

HadrienPatte · 2026-04-17T15:37:31Z

+	if len(node.Spec.PodCIDRs) > 0 && len(node.Spec.ProviderID) == 0 {
+		logger.Debug("node already has podCIDRs but providerID is empty, skipping")
+		return nil
+	}


I think this skip condition is currently inverted from what is is intended to be. The log message sounds like an OR but the test condition is an AND, meaning that we'll most likely never enter this block.

Also, there is currently no early exit for the "already has the expected PodCIDRs" case. This means that every node update event will trigger GetInstance, GetPrimaryVNIC and ListPrivateIps API calls. There's already such an early exit pattern in the node info controller here:

oci-cloud-controller-manager/pkg/cloudprovider/providers/oci/node_info_controller.go

Line 164 in 3d019ab

// if node has required labels already, don't process agin

I think this skip condition is currently inverted from what is is intended to be. The log message sounds like an OR but the test condition is an AND, meaning that we'll most likely never enter this block.

len(node.Spec.PodCIDRs) > 0 => node already has podCIDRs
&& => but
len(node.Spec.ProviderID) == 0 => providerID is empty

the if block looks good to me.

Also, there is currently no early exit for the "already has the expected PodCIDRs" case.

I agree an early exit for already processed node is desirable.

Good point, you're right, the if block is good 👍

On the early exit, the new expectedPodCIDRsCache works, but I think it can be improved even further:

Currently, the 5 minutes TTL means we re-do GetInstance + GetPrimaryVNICForInstance + ListPrivateIps for every already-reconciled node on every cache expiry, forever. A plain map[string][]string guarded by a mutex (or sync.Map), populated on successful reconcile and cleared from the existing DeleteFunc, would give zero API traffic in steady state without the TTL churn. The node informer's DeleteFunc is already wired up, so there's no leak risk.

The if fcc.expectedPodCIDRsCache == nil guards in setExpectedPodCIDRs / deleteExpectedPodCIDRs are unreachable, NewFlexCIDRController always constructs the store.

The expectedPodCIDRsEntry struct + expectedPodCIDRsCacheKeyFn exist only to satisfy cache.Store's interface, they disappear if you switch to a plain map.

Anyway, none of that is blocking, just minor suggestions ;)

Made the changes now and also made the controller optional and can be enabled via envvars

HadrienPatte · 2026-04-17T15:41:05Z

+	return family, nil
+}
+
+func PatchNodePodCIDRs(ctx context.Context, kubeClient kubernetes.Interface, nodeName string, podCIDRs []string, logger *zap.SugaredLogger) error {


This function does not actually do a "patch", the way it is currently implemented does a double update: first a dry-run update, then a retry-on-conflict update and a final post update verification get.

For performance and consistency with existing controllers, it would probably be better to use StrategicMergePatch like here:

oci-cloud-controller-manager/pkg/cloudprovider/providers/oci/node_info_controller.go

Line 187 in 3d019ab

_, err := nic.kubeClient.CoreV1().Nodes().Patch(context.Background(), cacheNode.Name, types.StrategicMergePatchType, nodePatchBytes, metav1.PatchOptions{})

HadrienPatte · 2026-04-17T15:43:57Z

+	"k8s.io/client-go/util/retry"
+)
+
+var retryRNG = rand.New(rand.NewSource(time.Now().UnixNano()))


This is not safe for concurrent use, with multiple workers calling runWithRateLimitRetry in parallel, this races.

That race can be avoided by using rand.Float64() (global, locked) or a sync.Mutex.

Doesn't seem like we use runWithRateLimitRetry() .

We could remove it.

Or remove retryRNG by replacing retryRNG.Float64() with rand.Float64()
i.e jitter := 1 + (rand.Float64()-0.5)*0.2

HadrienPatte · 2026-04-17T15:49:43Z

+	if f.ClusterIpFamily.IPv4 != "" {
+		ipv4FlexCIDR, err := f.CreateFlexCidr(primaryVnicID, true, false)
+		if err != nil {
+			return nil, err
+		}
+		flexCidrs = append(flexCidrs, ipv4FlexCIDR)
+	}
+
+	if f.ClusterIpFamily.IPv6 != "" {
+		ipv6FlexCIDR, err := f.CreateFlexCidr(primaryVnicID, false, true)
+		if err != nil {
+			return nil, err
+		}
+		flexCidrs = append(flexCidrs, ipv6FlexCIDR)
+	}


These per family logic blocks currently run sequencially, but are actually independent, they could be run in parallel

HadrienPatte · 2026-04-17T15:54:05Z

+}
+
+func formatIpCidr(ip string, mask *int) string {
+	return fmt.Sprintf("%s/%d", ip, *mask)


There's a risk for nil pointer dereference here if mask is nil

HadrienPatte · 2026-04-17T16:08:40Z

+type requireAssertions struct{}
+
+var require requireAssertions


This repository already imports testify which offers all those testing utilities:

oci-cloud-controller-manager/go.mod

Line 53 in 3d019ab

github.com/stretchr/testify v1.11.0

You can directly use functions like assert.EqualValues

HadrienPatte · 2026-04-17T16:10:14Z

@@ -0,0 +1,497 @@
+package flexcidr


I think you'll want to add the standard copyright header to all new files:

oci-cloud-controller-manager/cmd/oci-cloud-controller-manager/main.go

Lines 1 to 13 in 3d019ab

// Copyright 2017 Oracle and/or its affiliates. All rights reserved.

//

// Licensed under the Apache License, Version 2.0 (the "License");

// you may not use this file except in compliance with the License.

// You may obtain a copy of the License at

//

// http://www.apache.org/licenses/LICENSE-2.0

//

// Unless required by applicable law or agreed to in writing, software

// distributed under the License is distributed on an "AS IS" BASIS,

// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

// See the License for the specific language governing permissions and

// limitations under the License.

HadrienPatte · 2026-04-17T16:12:00Z

+	for k, v := range instance.Metadata {
+		if k == primaryVnicMetadataKey {


you don't need to loop every instance until you find the right one here, you can directly lookup the instance key on the map with something like v, ok := instance.Metadata[primaryVnicMetadataKey]

l-technicore · 2026-04-20T10:26:29Z

+	if len(node.Spec.PodCIDRs) > 0 && len(node.Spec.ProviderID) == 0 {
+		logger.Debug("node already has podCIDRs but providerID is empty, skipping")
+		return nil
+	}


I think this skip condition is currently inverted from what is is intended to be. The log message sounds like an OR but the test condition is an AND, meaning that we'll most likely never enter this block.

len(node.Spec.PodCIDRs) > 0 => node already has podCIDRs
&& => but
len(node.Spec.ProviderID) == 0 => providerID is empty

the if block looks good to me.

Also, there is currently no early exit for the "already has the expected PodCIDRs" case.

I agree an early exit for already processed node is desirable.

l-technicore · 2026-04-20T10:34:26Z

+		return nil
+	}
+
+	if err := fcc.instanceCache.Add(instance); err != nil {


Let us remove the instance cache since we do not use it. We would either way need the latest LifecycleState of the instance for the immediate next condition.

l-technicore · 2026-04-20T11:02:37Z

+	"k8s.io/client-go/util/retry"
+)
+
+var retryRNG = rand.New(rand.NewSource(time.Now().UnixNano()))


Doesn't seem like we use runWithRateLimitRetry() .

We could remove it.

Or remove retryRNG by replacing retryRNG.Float64() with rand.Float64()
i.e jitter := 1 + (rand.Float64()-0.5)*0.2

AkarshES · 2026-04-21T11:58:55Z

@HadrienPatte & @l-technicore I have addressed your comments. Can you please take a look again while I do the manual testing.

AkarshES · 2026-04-23T11:26:38Z

@HadrienPatte @l-technicore I have updated the PR to address all your comments and also made the controller optional and it can be enabled via envvars. Manual testing also looked good. Can you folks give a final review so that we can merge it

l-technicore · 2026-04-24T08:50:58Z

@HadrienPatte @l-technicore I have updated the PR to address all your comments and also made the controller optional and it can be enabled via envvars. Manual testing also looked good. Can you folks give a final review so that we can merge it

@AkarshES LGTM. I am fine with the latest changes. Any additional optimisations can be accepted via new PRs.

HadrienPatte

Looks great, thanks!

oracle-contributor-agreement Bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Apr 17, 2026

AkarshES requested a review from l-technicore April 17, 2026 12:57

HadrienPatte reviewed Apr 17, 2026

View reviewed changes

l-technicore reviewed Apr 20, 2026

View reviewed changes

AkarshES force-pushed the pr/flex-cidr-allocator branch from b64bc46 to b042c50 Compare April 22, 2026 12:58

Add flex CIDR allocator controller

b3e5f01

AkarshES force-pushed the pr/flex-cidr-allocator branch from b042c50 to b3e5f01 Compare April 23, 2026 11:25

AkarshES requested a review from l-technicore April 23, 2026 16:18

l-technicore approved these changes Apr 24, 2026

View reviewed changes

l-technicore assigned AkarshES Apr 24, 2026

HadrienPatte approved these changes Apr 24, 2026

View reviewed changes

AkarshES merged commit e318eb3 into oracle:master Apr 27, 2026
2 checks passed

AkarshES deleted the pr/flex-cidr-allocator branch April 27, 2026 13:14


		func init() { initOCIRetry() }

		func runWithRateLimitRetry(ctx context.Context, logger *zap.SugaredLogger, operation string, fn func(context.Context) error) error {

		type requireAssertions struct{}

		var require requireAssertions

	// Copyright 2017 Oracle and/or its affiliates. All rights reserved.
	//
	// Licensed under the Apache License, Version 2.0 (the "License");
	// you may not use this file except in compliance with the License.
	// You may obtain a copy of the License at
	//
	// http://www.apache.org/licenses/LICENSE-2.0
	//
	// Unless required by applicable law or agreed to in writing, software
	// distributed under the License is distributed on an "AS IS" BASIS,
	// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	// See the License for the specific language governing permissions and
	// limitations under the License.

		for k, v := range instance.Metadata {
		if k == primaryVnicMetadataKey {

Conversation

AkarshES commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AkarshES commented Apr 21, 2026

Uh oh!

AkarshES commented Apr 23, 2026

Uh oh!

l-technicore commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HadrienPatte left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

AkarshES commented Apr 17, 2026 •

edited

Loading

l-technicore commented Apr 24, 2026 •

edited

Loading