Skip to content

K8s CRD backend: tcp_port_list not persisted in LayerDrbdResources #489

@kvaps

Description

@kvaps

Bug Description

When using the K8s CRD database backend, the tcp_port_list field in LayerDrbdResources CRD entries is never persisted. This causes TCP port re-allocation on every controller restart, leading to port mismatches between controller and satellites after toggle-disk operations.

Root Cause

Migration 28 (Migration_28_v1_31_1_MoveTcpPortsToNodes) adds the tcp_port_list field to GenCrdV1_31_1.LayerDrbdResourcesSpec and upserts all entries with port values. However, examining a production cluster shows:

  • CRD schema has tcp_port_list: string (correctly)
  • CRD storage version is v1-31-1 (correctly)
  • All 1100+ CRD entries have tcp_port_list missing — both entries created before AND after the migration

The field is defined in GenCrdV1_31_1.LayerDrbdResourcesSpec (line 2190) and dataToCrd at line 624 correctly calls the tcpPortSetter. The GenCrdV1_15_0.LayerDrbdResourcesSpec does NOT have this field.

Possible causes:

  1. The migration upserts through v1-31-1 API, but K8s stored entries in v1-15-0 format (race between schema update and data write)
  2. A subsequent controller operation re-writes CRD entries through a code path that drops the field
  3. K8s trivial conversion between CRD versions doesn't preserve unknown fields

Impact

Every controller restart re-allocates TCP ports via initPorts() using preferred ports from peers. Since the pool state is empty at startup, ports may be allocated differently than the previous run. This causes:

  • Controller allocates port X for a resource on a node
  • Satellite's DRBD kernel is still bound to old port Y from the previous allocation
  • Satellite's .res file gets regenerated with port X, but drbdadm adjust fails because port X conflicts with another resource that got reassigned
  • Resources stuck in StandAlone/Connecting state after controller restarts

Reproduction

  1. Use K8s CRD backend
  2. Create DRBD resources
  3. Check CRD entries: kubectl get layerdrbdresources.internal.linstor.linbit.com -o json | python3 -c "import json,sys; data=json.load(sys.stdin); print(sum(1 for i in data['items'] if 'tcp_port_list' not in i.get('spec',{})))"
  4. All entries will show missing tcp_port_list
  5. Restart the controller
  6. Ports are re-allocated, potentially different from what DRBD is using

Environment

  • LINSTOR Server: v1.32.3
  • K8s CRD API version: v1-15-0 stored, v1-31-1 served
  • Database version: 28 (migration completed)

Suggested Fix

After loading DrbdRscData from DB, if tcp_port_list was null/empty, persist the allocated ports back to the CRD via an upsert. This ensures ports are persisted on the first startup after upgrade.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions