From 59c13bbef4027ade9ab2dfe516fa3f4965fed85f Mon Sep 17 00:00:00 2001 From: zbb88888 Date: Fri, 22 May 2026 14:51:47 +0800 Subject: [PATCH] update bgp with bfd Signed-off-by: zbb88888 --- docs/advance/with-bgp.en.md | 93 +++++++++++++++++++++++++++++++++++++ docs/advance/with-bgp.md | 93 +++++++++++++++++++++++++++++++++++++ 2 files changed, 186 insertions(+) diff --git a/docs/advance/with-bgp.en.md b/docs/advance/with-bgp.en.md index b8ca4f320..a99cafa20 100644 --- a/docs/advance/with-bgp.en.md +++ b/docs/advance/with-bgp.en.md @@ -39,6 +39,8 @@ If you only have one switch: - --neighbor-ipv6-address=2409:AB00:AB00:2000::AFB:8AFE - --neighbor-as=65030 - --cluster-as=65000 +# Optional: set --allowed-source-addresses to make sure nexthop in the allowed-source-addresses is valid. +# - --allowed-source-addresses=10.32.32.2,10.32.32.3,10.32.32.4,10.32.32.5 ``` If you have a pair of switches: @@ -49,11 +51,14 @@ If you have a pair of switches: - --neighbor-ipv6-address=2409:AB00:AB00:2000::AFB:8AFC,2409:AB00:AB00:2000::AFB:8AFD - --neighbor-as=65030 - --cluster-as=65000 +# Optional: set --allowed-source-addresses to make sure nexthop in the allowed-source-addresses is valid. +# - --allowed-source-addresses=10.32.32.2,10.32.32.3,10.32.32.4,10.32.32.5 ``` - `neighbor-address`: The address of the BGP Peer, usually the router gateway address. - `neighbor-as`: The AS number of the BGP Peer. - `cluster-as`: The AS number of the container network. +- `allowed-source-addresses`: Comma-separated whitelist of source IPs allowed on the speaker node. When set, the speaker performs an `ip route get` lookup to the BGP peer and validates that the selected source IP is in the whitelist. If it doesn't match, the speaker refuses to start. This ensures the correct source IP is used when publishing routes in ECMP environments, preventing return traffic blackholes caused by source IP mismatches. Apply the YAML: @@ -280,6 +285,94 @@ is handled by a daemon such as `kube-proxy`. The annotation for Services only su - `graceful-restart-deferral-time`: BGP Graceful restart deferral time refer to RFC4724 4.1. - `passivemode`: The Speaker runs in Passive mode and does not actively connect to the peer. - `ebgp-multihop`: The TTL value of EBGP Peer, default is 1. +- `allowed-source-addresses`: Comma-separated whitelist of source IPs. On startup, the speaker performs an `ip route get` lookup to the BGP peer and validates the kernel-selected source IP against the whitelist. If it doesn't match, the speaker refuses to start. Used in ECMP environments to ensure the correct source IP is used for route publishing. + +## BFD Fast Failure Detection + +When BGP peers with upstream switches, BFD (Bidirectional Forwarding Detection) can be deployed for rapid link failure detection, enabling sub-second failover when combined with BGP ECMP. + +Kube-OVN provides an [openbfdd](https://github.com/authmillenon/openbfdd)-based BFD DaemonSet that establishes BFD sessions between BGP nodes and switch gateways. With the default configuration, failure detection time is `BFD_MULTI * max(BFD_MIN_TX, BFD_MIN_RX) = 3 * 1000ms = 3 seconds`. Tune the parameters for faster or more conservative detection. + +> Note: OVN itself also supports BFD, which can be enabled on logical router ports via the `enableBfd` option. OVN BFD is managed automatically by the kube-ovn controller and is a separate mechanism from openbfdd. This section covers host-level BFD based on openbfdd. + +The DaemonSet uses `hostNetwork: true` and contains three containers per pod: + +- **init-peer**: Init container that discovers the local source IP to the gateway via `ip route get`, validates it against the whitelist, and writes the result to a shared volume. +- **bfdd**: The openbfdd daemon that uses the local IP discovered by the init container to establish a BFD session with the switch gateway. Initialized via `start-bfdd.sh` at startup; `bfdd-prestart.sh` serves as a startupProbe to validate session parameters. +- **reconcile**: Reconciliation loop container that checks the BFD session every 5 seconds and automatically re-adds the gateway peer if missing. + +Download the corresponding YAML: + +```bash +wget https://raw.githubusercontent.com/kubeovn/kube-ovn/{{ variables.branch }}/yamls/bfdd-daemonset.yaml +``` + +Edit `GATEWAY_ADDRESS` and `ALLOWED_SOURCE_ADDRESSES` to match your network. The whitelist should match the speaker's `--allowed-source-addresses` parameter: + +```yaml +env: + - name: GATEWAY_ADDRESS + value: "10.32.32.1" # Switch gateway address + - name: ALLOWED_SOURCE_ADDRESSES + value: "10.32.32.2,10.32.32.3,10.32.32.4,10.32.32.5" # Source IP whitelist + - name: BFD_MIN_TX + value: "1000" # Minimum transmit interval (ms) + - name: BFD_MIN_RX + value: "1000" # Minimum receive interval (ms) + - name: BFD_MULTI + value: "3" # Detection multiplier +``` + +Deploy the DaemonSet: + +```bash +kubectl apply -f bfdd-daemonset.yaml +``` + +### BFD Debugging + +```bash +# Check daemon status +bfdd-control status + +# Check specific BFD session (remote=gateway, local=our source IP) +bfdd-control status remote local + +# Add a BFD peer +bfdd-control allow + +# Tune session parameters +bfdd-control session new set mintx ms # Set minimum transmit interval +bfdd-control session new set minrx ms # Set minimum receive interval +bfdd-control session new set multi # Set detection multiplier + +# Disable command logging (reduces noise) +bfdd-control log type command no + +# View local IP discovered by init container +cat /bfdd-peer/local-ip + +# Check reconcile sidecar logs +kubectl logs -n kube-system ds/openbfdd -c reconcile + +# Check init container logs +kubectl logs -n kube-system ds/openbfdd -c init-peer +``` + +### OVN-level BFD Debugging + +OVN itself also supports BFD (managed by the kube-ovn controller). Use the following commands: + +```bash +# List OVN BFD entries +kubectl ko nbctl list bfd + +# Find BFD by logical router port +kubectl ko nbctl find bfd logical_port= + +# Delete OVN BFD entry by UUID +kubectl ko nbctl destroy bfd +``` ## BGP routes debug diff --git a/docs/advance/with-bgp.md b/docs/advance/with-bgp.md index 889a879e5..613a7ede7 100644 --- a/docs/advance/with-bgp.md +++ b/docs/advance/with-bgp.md @@ -33,6 +33,8 @@ wget https://raw.githubusercontent.com/kubeovn/kube-ovn/{{ variables.branch }}/y - --neighbor-ipv6-address=2409:AB00:AB00:2000::AFB:8AFE - --neighbor-as=65030 - --cluster-as=65000 +# 可选:设置源 IP 白名单,确保下一跳地址在白名单内 +# - --allowed-source-addresses=10.32.32.2,10.32.32.3,10.32.32.4,10.32.32.5 ``` 如果你有一对交换机: @@ -43,11 +45,14 @@ wget https://raw.githubusercontent.com/kubeovn/kube-ovn/{{ variables.branch }}/y - --neighbor-ipv6-address=2409:AB00:AB00:2000::AFB:8AFC,2409:AB00:AB00:2000::AFB:8AFD - --neighbor-as=65030 - --cluster-as=65000 +# 可选:设置源 IP 白名单,确保下一跳地址在白名单内 +# - --allowed-source-addresses=10.32.32.2,10.32.32.3,10.32.32.4,10.32.32.5 ``` - `neighbor-address`: BGP Peer 的地址,通常为路由器网关地址。 - `neighbor-as`: BGP Peer 的 AS 号。 - `cluster-as`: 容器网络的 AS 号。 +- `allowed-source-addresses`: Speaker 节点允许使用的源 IP 白名单,多个地址用逗号分隔。设置后,Speaker 会通过 `ip route get` 查找到达 BGP Peer 的路由,并验证路由选择的源 IP 是否在白名单内。如果不在白名单中,Speaker 将拒绝启动。该选项用于确保在 ECMP 环境中使用正确的源 IP 发布路由,避免因源 IP 不匹配导致的回程流量黑洞。 部署 yaml: @@ -210,6 +215,94 @@ kubectl annotate eip sample ovn.kubernetes.io/bgp=true - `graceful-restart-deferral-time`: BGP Graceful restart deferral time 可参考 RFC4724 4.1。 - `passivemode`: Speaker 运行在 passive 模式,不主动连接 peer。 - `ebgp-multihop`: ebgp ttl 默认值为 1。 +- `allowed-source-addresses`: 源 IP 白名单,多个地址用逗号分隔。Speaker 启动时通过 `ip route get` 查找到达 BGP Peer 的路由,验证内核选择的源 IP 是否在白名单内,不在则拒绝启动。用于 ECMP 环境下确保使用正确的源 IP 发布路由。 + +## BFD 快速故障检测 + +当 BGP 与上游交换机配合使用时,可以部署 BFD(Bidirectional Forwarding Detection)来实现链路故障的快速检测,配合 BGP ECMP 实现秒级故障切换。 + +Kube-OVN 提供了基于 [openbfdd](https://github.com/authmillenon/openbfdd) 的 BFD DaemonSet,用于在 BGP 节点上与交换机网关建立 BFD 会话。默认配置下故障检测时间为 `BFD_MULTI * max(BFD_MIN_TX, BFD_MIN_RX) = 3 * 1000ms = 3 秒`,可通过调整参数实现更快或更保守的检测。 + +> 注意:OVN 自身也支持 BFD,可以在逻辑路由器端口上通过 `enableBfd` 选项启用。OVN BFD 由 kube-ovn controller 自动管理,与 openbfdd 是独立的两套机制。本节介绍的是基于 openbfdd 的主机层面 BFD。 + +DaemonSet 使用 `hostNetwork: true` 模式部署,每个 Pod 包含三个容器: + +- **init-peer**: 初始化容器,通过 `ip route get` 发现到网关的本地源 IP,并验证是否在白名单内,将结果写入共享 Volume。 +- **bfdd**: openbfdd 守护进程,使用 init 容器发现的本地 IP 与交换机网关建立 BFD 会话。启动时通过 `start-bfdd.sh` 脚本初始化,`bfdd-prestart.sh` 作为 startupProbe 验证会话参数。 +- **reconcile**: 对账循环容器,每 5 秒检查一次 BFD 会话状态,如果发现网关 peer 缺失则自动添加。 + +下载对应的 yaml: + +```bash +wget https://raw.githubusercontent.com/kubeovn/kube-ovn/{{ variables.branch }}/yamls/bfdd-daemonset.yaml +``` + +修改 `GATEWAY_ADDRESS` 和 `ALLOWED_SOURCE_ADDRESSES` 以匹配实际网络环境,其中白名单应与 Speaker 的 `--allowed-source-addresses` 参数保持一致: + +```yaml +env: + - name: GATEWAY_ADDRESS + value: "10.32.32.1" # 交换机网关地址 + - name: ALLOWED_SOURCE_ADDRESSES + value: "10.32.32.2,10.32.32.3,10.32.32.4,10.32.32.5" # 源 IP 白名单 + - name: BFD_MIN_TX + value: "1000" # 最小发送间隔(毫秒) + - name: BFD_MIN_RX + value: "1000" # 最小接收间隔(毫秒) + - name: BFD_MULTI + value: "3" # 检测倍数 +``` + +部署 DaemonSet: + +```bash +kubectl apply -f bfdd-daemonset.yaml +``` + +### BFD 调试 + +```bash +# 查看 BFD daemon 状态 +bfdd-control status + +# 查看特定 BFD 会话(remote=网关, local=本机源 IP) +bfdd-control status remote local + +# 添加 BFD 对端 +bfdd-control allow + +# 调整会话参数 +bfdd-control session new set mintx ms # 设置最小发送间隔 +bfdd-control session new set minrx ms # 设置最小接收间隔 +bfdd-control session new set multi # 设置检测倍数 + +# 禁用命令日志(减少噪音) +bfdd-control log type command no + +# 查看 init 容器发现的本地 IP +cat /bfdd-peer/local-ip + +# 查看 reconcile sidecar 日志 +kubectl logs -n kube-system ds/openbfdd -c reconcile + +# 查看 init 容器日志 +kubectl logs -n kube-system ds/openbfdd -c init-peer +``` + +### OVN 层面 BFD 调试 + +OVN 自身也支持 BFD(由 kube-ovn controller 管理),可以通过以下命令查看: + +```bash +# 列出 OVN BFD 条目 +kubectl ko nbctl list bfd + +# 按逻辑路由器端口查找 BFD +kubectl ko nbctl find bfd logical_port= + +# 删除 OVN BFD 条目 +kubectl ko nbctl destroy bfd +``` ## BGP routes debug