Add OVN RBAC support with per-node ovn-controller certificates#565
Add OVN RBAC support with per-node ovn-controller certificates#565slawqo wants to merge 1 commit intoopenstack-k8s-operators:mainfrom
Conversation
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: slawqo The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
7ec7d97 to
49c6367
Compare
There was a problem hiding this comment.
I don't think a dedicated rootca-ovn-rbac CA is needed here. OVN RBAC only checks that the client cert CN matches the chassis system-id — it does not care which CA signed the cert. The existing rootca-ovn CA can sign the RBAC certs just fine, which would remove the need for the new CA, the combined CA bundle in setup.sh, and the RbacCACertSecretName field. Or am I missing a point?
This would also make the EDPM side much simpler: the existing per-node EDPM OVN cert is already signed by rootca-ovn and has CN= with client auth usage, so it already works for RBAC as-is. The only thing needed on the EDPM side would be to set external-ids:system-id to the hostname so it matches the cert CN, instead of letting ovn-controller generate a random UUID. No new certs or CA changes needed for EDPM.
Also, I think using CN=nodeName instead of CN=UUID5(nodeName) would be simpler and easier to debug — ovn-sbctl list chassis would show actual node names.
What you think?
[update]
initially I was wondering if we could just do the rbac cert creation in the openstack-op for the ovn-controllers and kust provide the list of cert secrets. but then realized that the ovn-controllers use daemonset (was thinking its a statefulset or so), which makes it harder to predict which nodes it gets scheduled to from there, especially if you use nodeselector, or topologyRef. therefor its probably ok to issue the rbac certs for the ovn-controller k8s side in the operator.
| fmt.Sprintf("--certificate=%s", ovn_common.OVNDbCertPath), | ||
| fmt.Sprintf("--private-key=%s", ovn_common.OVNDbKeyPath), | ||
| fmt.Sprintf("--certificate=%s", ovn_common.OVNControllerCertPath), | ||
| fmt.Sprintf("--private-key=%s", ovn_common.OVNControllerKeyPath), |
There was a problem hiding this comment.
Can we expect RBAC is always enabled? Without RBAC configured (RbacIssuerName empty), no config job copies certs to these paths. This should probably be conditional:
if instance.Spec.RbacIssuerName != "" {
cmd = append(cmd, fmt.Sprintf("--certificate=%s", ovn_common.OVNControllerCertPath))
cmd = append(cmd, fmt.Sprintf("--private-key=%s", ovn_common.OVNControllerKeyPath))
} else {
cmd = append(cmd, fmt.Sprintf("--certificate=%s", ovn_common.OVNDbCertPath))
cmd = append(cmd, fmt.Sprintf("--private-key=%s", ovn_common.OVNDbKeyPath))
}
| } | ||
|
|
||
| // Add full-access (non-RBAC) port for SB when TLS is enabled | ||
| if instance.Spec.DBType == ovnv1.SBDBType && instance.Spec.TLS.Enabled() { |
There was a problem hiding this comment.
maybe should also be conditional on RbacCACertSecretName != "", so only if rbac is enabled.
| ${CTLCMD} set-ssl {{.OVNDB_KEY_PATH}} {{.OVNDB_CERT_PATH}} ${SSL_CA_CERT} | ||
| if [[ "${DB_TYPE}" == "sb" ]]; then | ||
| # Use RBAC for the connections of the ovn-controller to SB | ||
| ${CTLCMD} set-connection role=ovn-controller ${DB_SCHEME}:${DB_PORT}:${DB_ADDR} |
There was a problem hiding this comment.
RBAC role is set on all SB databases when TLS is enabled, even if no RBAC certs exist. Combined with issue mentioned above if RBAC is not enabled (CA set), this means TLS-enabled deployments without RBAC will have the SB reject all ovn-controller connections.
Maybe we should do this?
{{- if index . "OVN_RBAC_CACERT_PATH" }}
${CTLCMD} set-connection role=ovn-controller ${DB_SCHEME}:${DB_PORT}:${DB_ADDR}
...
{{- else }}
${CTLCMD} set-connection ${DB_SCHEME}:${DB_PORT}:${DB_ADDR}
{{- end }}
| func ComputeSystemID(nodeName string) string { | ||
| return uuid.NewSHA1(uuid.NameSpaceDNS, []byte(nodeName)).String() | ||
| } |
There was a problem hiding this comment.
is it required to use UUID5 system-id ? wouldn't it be easier for debugging if the nodeNames are used/shown?
There was a problem hiding this comment.
the current edpm nodes ovn certs already have CN=hostname. if we'd use the same CA and nodename as CN it might make transition/migration easier?
Subject: O=rootca-ovn, CN=compute-n95n1war-0
|
|
||
| // EnsureRbacCert creates or updates a cert-manager Certificate CR for a node's | ||
| // RBAC client certificate, and waits for the resulting Secret to become ready. | ||
| func EnsureRbacCert( |
There was a problem hiding this comment.
I think we could use EnsureCert from https://github.com/openstack-k8s-operators/lib-common/blob/main/modules/certmanager/certificate.go#L172 ?
using the cert-duration from the issuer annotation https://github.com/openstack-k8s-operators/lib-common/blob/main/modules/certmanager/issuer.go#L169:
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
annotations:
cert-duration: 43800h0m0s
| Name: "rbac-ca-cert", | ||
| VolumeSource: corev1.VolumeSource{ | ||
| Secret: &corev1.SecretVolumeSource{ | ||
| SecretName: instance.Spec.RbacCACertSecretName, |
There was a problem hiding this comment.
iiuc, OVN RBAC checks CN vs system-id, not CA identity. Using the existing rootca-ovn to sign RBAC certs would eliminate: the RbacCACertSecretName field, the RBAC CA volume mount on SB pods, and the combined bundle construction in setup.sh.
| {{- if index . "OVN_RBAC_CACERT_PATH" }} | ||
| SSL_CA_CERT=/etc/ovn/combined-ca.pem | ||
| cat {{.OVNDB_CACERT_PATH}} {{ index . "OVN_RBAC_CACERT_PATH" }} > ${SSL_CA_CERT} | ||
| {{- else }} | ||
| SSL_CA_CERT={{.OVNDB_CACERT_PATH}} | ||
| {{- end }} |
There was a problem hiding this comment.
not required if we'd use the dedicated already existing ovn CA
| Namespace: instance.Namespace, | ||
| Labels: labels, | ||
| }, | ||
| Spec: certmgrv1.CertificateSpec{ |
There was a problem hiding this comment.
we are not setting a duration for the cert. maybe it should be longer then the default, like we do with other service certs.
| return ctrl.Result{}, err | ||
| } | ||
| Log.Info("Creating RBAC certificate", "name", certName, "cn", commonName) | ||
| err = c.Create(ctx, cert) |
There was a problem hiding this comment.
one thing which can be followed up is cleanup of RBAC certs for no longer existing nodes?
3792a57 to
cedaa55
Compare
|
Build failed (check pipeline). Post ❌ openstack-k8s-operators-content-provider FAILURE in 11m 14s |
cedaa55 to
6c119ce
Compare
|
Build failed (check pipeline). Post ✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 25m 49s |
Enable OVN role-based access control (RBAC) on the Southbound database so that ovn-controller nodes can only modify their own chassis rows. When the openstack-operator provides an RBAC issuer name (from a dedicated rootca-ovn-rbac CA, see patch [1]), this patch: * Creates per-node cert-manager Certificate CRs for each ovn-controller pod, with CN set to a deterministic UUID5 system-id derived from the node name (ComputeSystemID). This CN must match the chassis system-id for RBAC to authorize operations. * Copies the RBAC client cert/key into /etc/openvswitch/ on each node via the config job, and switches ovn-controller to use these dedicated paths instead of the shared OVN DB cert. * Mounts the RBAC CA certificate into ovsdbserver-sb pods and builds a combined CA bundle (regular CA + RBAC CA) so the SB database can verify ovn-controller client certificates. * Sets role=ovn-controller on the SB DB connection (port 6642) to enforce RBAC. * Creates a second SB DB listener on port 16642 with full (unrestricted) access, used by ovn-northd. * Updates inactivity probe handling in setup.sh and runtime-config.sh to iterate over all connections, since SB now has two listeners. * ovn-controller POD now waits for the Northd to be ready before start, it is done to avoid race condition when ovn-controller POD could be started before Northd would populate RBAC rules in the SB DB and that could cause issues with connection of the ovn-controller to the SB DB. * remove OVS DaemonSet readiness gate in the ovncontroller controller - it was there to make sure that local ovsdb is up so that config job would be able to store config values in it. But init scripts are already waiting actively for the ovsdb to become active before it anything else will be done. This check is also causing deadlock with deploying ovs and ovn-controller PODs now with RBAC enabled as ovn-controller needs to have certificates ready to start and create br-int brigde. That can't be done if the config job is not started and config job couldn't be started because ovncontroller controller was waiting for the OVS DaemonSet to be ready. * Remove OVS DaemonSet readiness gate from the ovncontroller controller - the gate ensured that the local ovsdb was running before the config job attempted to store configuration values. However, the init scripts already poll for ovsdb availability before doing anything else, making the gate redundant. With RBAC enabled, this gate also causes a deadlock: ovn-controller needs its RBAC certificates to start and create the br-int bridge, but those certificates are deployed by the config job, which cannot run until the OVS DaemonSet is ready — and the OVS DaemonSet cannot become ready without br-int. [1] openstack-k8s-operators/openstack-operator#1906 Related: #OSPRH-1921 Closes: #OSPRH-1922 Signed-off-by: Slawek Kaplonski <skaplons@redhat.com>
6c119ce to
1ced423
Compare
|
@slawqo: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Enable OVN role-based access control (RBAC) on the Southbound database so that ovn-controller nodes can only modify their own chassis rows.
When the openstack-operator provides an RBAC issuer name (from a dedicated rootca-ovn-rbac CA, see patch [1]), this patch:
Creates per-node cert-manager Certificate CRs for each ovn-controller pod, with CN set to a deterministic UUID5 system-id derived from the node name (ComputeSystemID). This CN must match the chassis system-id for RBAC to authorize operations.
Copies the RBAC client cert/key into /etc/openvswitch/ on each node via the config job, and switches ovn-controller to use these dedicated paths instead of the shared OVN DB cert.
Mounts the RBAC CA certificate into ovsdbserver-sb pods and builds a combined CA bundle (regular CA + RBAC CA) so the SB database can verify ovn-controller client certificates.
Sets role=ovn-controller on the SB DB connection (port 6642) to enforce RBAC.
Creates a second SB DB listener on port 16642 with full (unrestricted) access, used by ovn-northd.
Updates inactivity probe handling in setup.sh and runtime-config.sh to iterate over all connections, since SB now has two listeners.
[1] openstack-k8s-operators/openstack-operator#1906
Depends-On: openstack-k8s-operators/openstack-operator#1906
Related: #OSPRH-1921
Closes: #OSPRH-1922