Bug Description
When using the K8s CRD database backend, the tcp_port_list field in LayerDrbdResources CRD entries is never persisted. This causes TCP port re-allocation on every controller restart, leading to port mismatches between controller and satellites after toggle-disk operations.
Root Cause
Migration 28 (Migration_28_v1_31_1_MoveTcpPortsToNodes) adds the tcp_port_list field to GenCrdV1_31_1.LayerDrbdResourcesSpec and upserts all entries with port values. However, examining a production cluster shows:
- CRD schema has
tcp_port_list: string (correctly)
- CRD storage version is
v1-31-1 (correctly)
- All 1100+ CRD entries have
tcp_port_list missing — both entries created before AND after the migration
The field is defined in GenCrdV1_31_1.LayerDrbdResourcesSpec (line 2190) and dataToCrd at line 624 correctly calls the tcpPortSetter. The GenCrdV1_15_0.LayerDrbdResourcesSpec does NOT have this field.
Possible causes:
- The migration upserts through v1-31-1 API, but K8s stored entries in v1-15-0 format (race between schema update and data write)
- A subsequent controller operation re-writes CRD entries through a code path that drops the field
- K8s trivial conversion between CRD versions doesn't preserve unknown fields
Impact
Every controller restart re-allocates TCP ports via initPorts() using preferred ports from peers. Since the pool state is empty at startup, ports may be allocated differently than the previous run. This causes:
- Controller allocates port X for a resource on a node
- Satellite's DRBD kernel is still bound to old port Y from the previous allocation
- Satellite's
.res file gets regenerated with port X, but drbdadm adjust fails because port X conflicts with another resource that got reassigned
- Resources stuck in StandAlone/Connecting state after controller restarts
Reproduction
- Use K8s CRD backend
- Create DRBD resources
- Check CRD entries:
kubectl get layerdrbdresources.internal.linstor.linbit.com -o json | python3 -c "import json,sys; data=json.load(sys.stdin); print(sum(1 for i in data['items'] if 'tcp_port_list' not in i.get('spec',{})))"
- All entries will show missing
tcp_port_list
- Restart the controller
- Ports are re-allocated, potentially different from what DRBD is using
Environment
- LINSTOR Server: v1.32.3
- K8s CRD API version: v1-15-0 stored, v1-31-1 served
- Database version: 28 (migration completed)
Suggested Fix
After loading DrbdRscData from DB, if tcp_port_list was null/empty, persist the allocated ports back to the CRD via an upsert. This ensures ports are persisted on the first startup after upgrade.
Bug Description
When using the K8s CRD database backend, the
tcp_port_listfield inLayerDrbdResourcesCRD entries is never persisted. This causes TCP port re-allocation on every controller restart, leading to port mismatches between controller and satellites after toggle-disk operations.Root Cause
Migration 28 (
Migration_28_v1_31_1_MoveTcpPortsToNodes) adds thetcp_port_listfield toGenCrdV1_31_1.LayerDrbdResourcesSpecand upserts all entries with port values. However, examining a production cluster shows:tcp_port_list: string(correctly)v1-31-1(correctly)tcp_port_listmissing — both entries created before AND after the migrationThe field is defined in
GenCrdV1_31_1.LayerDrbdResourcesSpec(line 2190) anddataToCrdat line 624 correctly calls thetcpPortSetter. TheGenCrdV1_15_0.LayerDrbdResourcesSpecdoes NOT have this field.Possible causes:
Impact
Every controller restart re-allocates TCP ports via
initPorts()using preferred ports from peers. Since the pool state is empty at startup, ports may be allocated differently than the previous run. This causes:.resfile gets regenerated with port X, butdrbdadm adjustfails because port X conflicts with another resource that got reassignedReproduction
kubectl get layerdrbdresources.internal.linstor.linbit.com -o json | python3 -c "import json,sys; data=json.load(sys.stdin); print(sum(1 for i in data['items'] if 'tcp_port_list' not in i.get('spec',{})))"tcp_port_listEnvironment
Suggested Fix
After loading DrbdRscData from DB, if
tcp_port_listwas null/empty, persist the allocated ports back to the CRD via an upsert. This ensures ports are persisted on the first startup after upgrade.