Skip to content

Add OVN RBAC support with per-node ovn-controller certificates#565

Open
slawqo wants to merge 1 commit intoopenstack-k8s-operators:mainfrom
slawqo:issue_osprh_1922_v2
Open

Add OVN RBAC support with per-node ovn-controller certificates#565
slawqo wants to merge 1 commit intoopenstack-k8s-operators:mainfrom
slawqo:issue_osprh_1922_v2

Conversation

@slawqo
Copy link
Copy Markdown
Contributor

@slawqo slawqo commented Apr 29, 2026

Enable OVN role-based access control (RBAC) on the Southbound database so that ovn-controller nodes can only modify their own chassis rows.

When the openstack-operator provides an RBAC issuer name (from a dedicated rootca-ovn-rbac CA, see patch [1]), this patch:

  • Creates per-node cert-manager Certificate CRs for each ovn-controller pod, with CN set to a deterministic UUID5 system-id derived from the node name (ComputeSystemID). This CN must match the chassis system-id for RBAC to authorize operations.

  • Copies the RBAC client cert/key into /etc/openvswitch/ on each node via the config job, and switches ovn-controller to use these dedicated paths instead of the shared OVN DB cert.

  • Mounts the RBAC CA certificate into ovsdbserver-sb pods and builds a combined CA bundle (regular CA + RBAC CA) so the SB database can verify ovn-controller client certificates.

  • Sets role=ovn-controller on the SB DB connection (port 6642) to enforce RBAC.

  • Creates a second SB DB listener on port 16642 with full (unrestricted) access, used by ovn-northd.

  • Updates inactivity probe handling in setup.sh and runtime-config.sh to iterate over all connections, since SB now has two listeners.

[1] openstack-k8s-operators/openstack-operator#1906

Depends-On: openstack-k8s-operators/openstack-operator#1906

Related: #OSPRH-1921

Closes: #OSPRH-1922

@openshift-ci openshift-ci Bot requested review from dprince and stuggi April 29, 2026 14:46
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 29, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: slawqo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown
Contributor

@stuggi stuggi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think a dedicated rootca-ovn-rbac CA is needed here. OVN RBAC only checks that the client cert CN matches the chassis system-id — it does not care which CA signed the cert. The existing rootca-ovn CA can sign the RBAC certs just fine, which would remove the need for the new CA, the combined CA bundle in setup.sh, and the RbacCACertSecretName field. Or am I missing a point?

This would also make the EDPM side much simpler: the existing per-node EDPM OVN cert is already signed by rootca-ovn and has CN= with client auth usage, so it already works for RBAC as-is. The only thing needed on the EDPM side would be to set external-ids:system-id to the hostname so it matches the cert CN, instead of letting ovn-controller generate a random UUID. No new certs or CA changes needed for EDPM.

Also, I think using CN=nodeName instead of CN=UUID5(nodeName) would be simpler and easier to debug — ovn-sbctl list chassis would show actual node names.

What you think?

[update]
initially I was wondering if we could just do the rbac cert creation in the openstack-op for the ovn-controllers and kust provide the list of cert secrets. but then realized that the ovn-controllers use daemonset (was thinking its a statefulset or so), which makes it harder to predict which nodes it gets scheduled to from there, especially if you use nodeselector, or topologyRef. therefor its probably ok to issue the rbac certs for the ovn-controller k8s side in the operator.

Comment on lines -63 to +69
fmt.Sprintf("--certificate=%s", ovn_common.OVNDbCertPath),
fmt.Sprintf("--private-key=%s", ovn_common.OVNDbKeyPath),
fmt.Sprintf("--certificate=%s", ovn_common.OVNControllerCertPath),
fmt.Sprintf("--private-key=%s", ovn_common.OVNControllerKeyPath),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we expect RBAC is always enabled? Without RBAC configured (RbacIssuerName empty), no config job copies certs to these paths. This should probably be conditional:

  if instance.Spec.RbacIssuerName != "" {
      cmd = append(cmd, fmt.Sprintf("--certificate=%s", ovn_common.OVNControllerCertPath))
      cmd = append(cmd, fmt.Sprintf("--private-key=%s", ovn_common.OVNControllerKeyPath))  
  } else {                                                                                    
      cmd = append(cmd, fmt.Sprintf("--certificate=%s", ovn_common.OVNDbCertPath))            
      cmd = append(cmd, fmt.Sprintf("--private-key=%s", ovn_common.OVNDbKeyPath))             
  } 

}

// Add full-access (non-RBAC) port for SB when TLS is enabled
if instance.Spec.DBType == ovnv1.SBDBType && instance.Spec.TLS.Enabled() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe should also be conditional on RbacCACertSecretName != "", so only if rbac is enabled.

${CTLCMD} set-ssl {{.OVNDB_KEY_PATH}} {{.OVNDB_CERT_PATH}} ${SSL_CA_CERT}
if [[ "${DB_TYPE}" == "sb" ]]; then
# Use RBAC for the connections of the ovn-controller to SB
${CTLCMD} set-connection role=ovn-controller ${DB_SCHEME}:${DB_PORT}:${DB_ADDR}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RBAC role is set on all SB databases when TLS is enabled, even if no RBAC certs exist. Combined with issue mentioned above if RBAC is not enabled (CA set), this means TLS-enabled deployments without RBAC will have the SB reject all ovn-controller connections.
Maybe we should do this?

  {{- if index . "OVN_RBAC_CACERT_PATH" }}
      ${CTLCMD} set-connection role=ovn-controller ${DB_SCHEME}:${DB_PORT}:${DB_ADDR}                                                                                                        
      ...                                                                                                                                     
  {{- else }}                                                                                                                                                                                
      ${CTLCMD} set-connection ${DB_SCHEME}:${DB_PORT}:${DB_ADDR}
  {{- end }}  

Comment on lines +30 to +32
func ComputeSystemID(nodeName string) string {
return uuid.NewSHA1(uuid.NameSpaceDNS, []byte(nodeName)).String()
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it required to use UUID5 system-id ? wouldn't it be easier for debugging if the nodeNames are used/shown?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the current edpm nodes ovn certs already have CN=hostname. if we'd use the same CA and nodename as CN it might make transition/migration easier?

        Subject: O=rootca-ovn, CN=compute-n95n1war-0


// EnsureRbacCert creates or updates a cert-manager Certificate CR for a node's
// RBAC client certificate, and waits for the resulting Secret to become ready.
func EnsureRbacCert(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could use EnsureCert from https://github.com/openstack-k8s-operators/lib-common/blob/main/modules/certmanager/certificate.go#L172 ?

using the cert-duration from the issuer annotation https://github.com/openstack-k8s-operators/lib-common/blob/main/modules/certmanager/issuer.go#L169:

apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  annotations:
    cert-duration: 43800h0m0s

Name: "rbac-ca-cert",
VolumeSource: corev1.VolumeSource{
Secret: &corev1.SecretVolumeSource{
SecretName: instance.Spec.RbacCACertSecretName,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iiuc, OVN RBAC checks CN vs system-id, not CA identity. Using the existing rootca-ovn to sign RBAC certs would eliminate: the RbacCACertSecretName field, the RBAC CA volume mount on SB pods, and the combined bundle construction in setup.sh.

Comment on lines +23 to +28
{{- if index . "OVN_RBAC_CACERT_PATH" }}
SSL_CA_CERT=/etc/ovn/combined-ca.pem
cat {{.OVNDB_CACERT_PATH}} {{ index . "OVN_RBAC_CACERT_PATH" }} > ${SSL_CA_CERT}
{{- else }}
SSL_CA_CERT={{.OVNDB_CACERT_PATH}}
{{- end }}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not required if we'd use the dedicated already existing ovn CA

Namespace: instance.Namespace,
Labels: labels,
},
Spec: certmgrv1.CertificateSpec{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are not setting a duration for the cert. maybe it should be longer then the default, like we do with other service certs.

return ctrl.Result{}, err
}
Log.Info("Creating RBAC certificate", "name", certName, "cn", commonName)
err = c.Create(ctx, cert)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one thing which can be followed up is cleanup of RBAC certs for no longer existing nodes?

@slawqo slawqo force-pushed the issue_osprh_1922_v2 branch 2 times, most recently from 3792a57 to cedaa55 Compare May 4, 2026 10:38
@centosinfra-prod-github-app
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://gateway-cloud-softwarefactory.apps.ocp.cloud.ci.centos.org/zuul/t/rdo/buildset/101b91be51ff46aea15d87c84719977e

openstack-k8s-operators-content-provider FAILURE in 11m 14s
⚠️ ovn-operator-tempest-multinode SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider

@slawqo slawqo force-pushed the issue_osprh_1922_v2 branch from cedaa55 to 6c119ce Compare May 4, 2026 12:51
@centosinfra-prod-github-app
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://gateway-cloud-softwarefactory.apps.ocp.cloud.ci.centos.org/zuul/t/rdo/buildset/72eb8d2539cd44a2a21eeafe26578e09

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 25m 49s
ovn-operator-tempest-multinode FAILURE in 1h 06m 37s

Enable OVN role-based access control (RBAC) on the Southbound database
so that ovn-controller nodes can only modify their own chassis rows.

When the openstack-operator provides an RBAC issuer name (from a
dedicated rootca-ovn-rbac CA, see patch [1]), this patch:

* Creates per-node cert-manager Certificate CRs for each ovn-controller
  pod, with CN set to a deterministic UUID5 system-id derived from the
  node name (ComputeSystemID). This CN must match the chassis
  system-id for RBAC to authorize operations.

* Copies the RBAC client cert/key into /etc/openvswitch/ on each node
  via the config job, and switches ovn-controller to use these dedicated
  paths instead of the shared OVN DB cert.

* Mounts the RBAC CA certificate into ovsdbserver-sb pods and builds a
  combined CA bundle (regular CA + RBAC CA) so the SB database can
  verify ovn-controller client certificates.

* Sets role=ovn-controller on the SB DB connection (port 6642) to enforce RBAC.

* Creates a second SB DB listener on port 16642 with full (unrestricted)
  access, used by ovn-northd.

* Updates inactivity probe handling in setup.sh and runtime-config.sh to
  iterate over all connections, since SB now has two listeners.

* ovn-controller POD now waits for the Northd to be ready before start,
  it is done to avoid race condition when ovn-controller POD could be
  started before Northd would populate RBAC rules in the SB DB and that
  could cause issues with connection of the ovn-controller to the SB DB.

* remove OVS DaemonSet readiness gate in the ovncontroller controller -
  it was there to make sure that local ovsdb is up so that config job
  would be able to store config values in it. But init scripts are
  already waiting actively for the ovsdb to become active before it
  anything else will be done. This check is also causing deadlock with
  deploying ovs and ovn-controller PODs now with RBAC enabled as
  ovn-controller needs to have certificates ready to start and create
  br-int brigde. That can't be done if the config job is not started and
  config job couldn't be started because ovncontroller controller was
  waiting for the OVS DaemonSet to be ready.

* Remove OVS DaemonSet readiness gate from the ovncontroller controller -
  the gate ensured that the local ovsdb was running before the config job
  attempted to store configuration values. However, the init scripts
  already poll for ovsdb availability before doing anything else, making
  the gate redundant.
  With RBAC enabled, this gate also causes a deadlock: ovn-controller
  needs its RBAC certificates to start and create the br-int bridge, but
  those certificates are deployed by the config job, which cannot run
  until the OVS DaemonSet is ready — and the OVS DaemonSet cannot become
  ready without br-int.

[1] openstack-k8s-operators/openstack-operator#1906

Related: #OSPRH-1921
Closes: #OSPRH-1922

Signed-off-by: Slawek Kaplonski <skaplons@redhat.com>
@slawqo slawqo force-pushed the issue_osprh_1922_v2 branch from 6c119ce to 1ced423 Compare May 5, 2026 13:20
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 5, 2026

@slawqo: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/ovn-operator-build-deploy-kuttl 6c119ce link true /test ovn-operator-build-deploy-kuttl
ci/prow/precommit-check 1ced423 link true /test precommit-check

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants