fix(drbd): use actual device path in res file during toggle-disk by kvaps · Pull Request #473 · LINBIT/linstor-server

kvaps · 2026-01-06T08:21:50Z

Summary

Fix res file generation during toggle-disk operation from diskless to diskful
Add check for DISK_ADD_REQUESTED/DISK_ADDING flags to use actual device path instead of "none"

Problem

When converting a diskless resource to diskful via toggle-disk -s <storage-pool>, the res file generator outputs disk none because isDrbdDiskless() returns true until the operation completes. This causes drbdadm adjust to fail as the res file doesn't reflect the actual storage device already created by the storage layer.

Solution

Check for DISK_ADD_REQUESTED or DISK_ADDING flags and use the actual device path when toggle-disk operation is in progress.

Test plan

Create diskless DRBD resource with LUKS encryption
Run toggle-disk -s <storage-pool> to convert to diskful
Verify res file contains actual device path, not "none"
Verify drbdadm adjust succeeds

When a diskless resource is being converted to diskful via toggle-disk, the res file generator was still outputting "disk none" because it only checked isDrbdDiskless() which returns true until the operation completes. This caused drbdadm adjust to fail because the res file didn't reflect the actual storage device that was already created by the storage layer. Add check for DISK_ADD_REQUESTED/DISK_ADDING flags to use the actual device path when toggle-disk operation is in progress. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

## What this PR does Build piraeus-server (linstor-server) from source with custom patches: - **adjust-on-resfile-change.diff** — Use actual device path in res file during toggle-disk; fix LUKS data offset - Upstream: [#473](LINBIT/linstor-server#473), [#472](LINBIT/linstor-server#472) - **allow-toggle-disk-retry.diff** — Allow retry and cancellation of failed toggle-disk operations - Upstream: [#475](LINBIT/linstor-server#475) - **force-metadata-check-on-disk-add.diff** — Create metadata during toggle-disk from diskless to diskful - Upstream: [#474](LINBIT/linstor-server#474) - **skip-adjust-when-device-inaccessible.diff** — Skip DRBD adjust/res file regeneration when child layer device is inaccessible - Upstream: [#471](LINBIT/linstor-server#471) Also updates plunger-satellite script and values.yaml for the new build. ### Release note ```release-note [linstor] Build linstor-server with custom patches for improved disk handling ```  ## Summary by CodeRabbit * **New Features** * Added automatic DRBD stall detection and recovery, improving storage resync resilience without manual intervention. * Introduced configurable container image references via Helm values for streamlined deployment. * **Bug Fixes** * Enhanced disk toggle operations with retry and cancellation support for better error handling. * Improved metadata creation during disk state transitions. * Added device accessibility checks to prevent errors when underlying storage devices are unavailable. * Fixed LUKS encryption header sizing for consistent deployment across nodes. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub>

ghernadi · 2026-01-14T06:49:12Z

Hello!

I am trying to reproduce this issue but I can't. From your description it sounds like this scenario should always fail, but for me it just works fine:

# linstor --no-utf8 --no-color r l -a
+----------------------------------------------------------------------------------------------+
| ResourceName | Node    | Layers            | Usage  | Conns |    State | CreatedOn           |
|==============================================================================================|
| rsc          | bravo   | DRBD,LUKS,STORAGE | Unused | Ok    | UpToDate | 2026-01-14 07:36:57 |
| rsc          | charlie | DRBD,LUKS,STORAGE | Unused | Ok    | UpToDate | 2026-01-14 07:36:57 |
| rsc          | delta   | DRBD,STORAGE      | Unused | Ok    | Diskless | 2026-01-14 07:36:59 |
+----------------------------------------------------------------------------------------------+
# linstor r td delta rsc -s lvmpool
# linstor --no-utf8 --no-color r l -a
+----------------------------------------------------------------------------------------------+
| ResourceName | Node    | Layers            | Usage  | Conns |    State | CreatedOn           |
|==============================================================================================|
| rsc          | bravo   | DRBD,LUKS,STORAGE | Unused | Ok    | UpToDate | 2026-01-14 07:36:57 |
| rsc          | charlie | DRBD,LUKS,STORAGE | Unused | Ok    | UpToDate | 2026-01-14 07:36:57 |
| rsc          | delta   | DRBD,LUKS,STORAGE | Unused | Ok    | UpToDate | 2026-01-14 07:36:59 |
+----------------------------------------------------------------------------------------------+

Snippet of the resulting .res file from delta:

    on "delta"
    {
        volume 0
        {
            disk        /dev/mapper/Linstor-Crypt-rsc_00000;

root@delta:~# lsblk -s /dev/drbd1000 
NAME                      MAJ:MIN  RM SIZE RO TYPE  MOUNTPOINTS
drbd1000                  147:1000  0  12M  0 disk  
└─Linstor-Crypt-rsc_00000 253:1     0  12M  0 crypt 
  └─scratch-rsc_00000     253:0     0  28M  0 lvm   
    └─vdb                 252:16    0  10G  0 disk

I do not really understand why this would be an issue. Without your patch the local DRBD will still be configured as diskless while the storage and in your case the LUKS device is already up and running but ignored for the current device-manager-run. In the next devMgrRun the toggle-disk flags should be removed along with the diskless flag, which will then finally make the ConfFileBuilder to generate the device path. So your patch seems only to change the behavior that LINSTOR tells DRBD one step earlier to use the LUKS device. Can you elaborate why this is necessary? I do not see the issue with the current behavior right now, but I can of course be missing something.

Do you have an ErrorReport you could share or reproducer for us to understand the issue better?
Does my test miss something from your context?

kvaps · 2026-01-14T16:59:58Z

Hi Gabor!

Thank you for testing this. The context I missed in the PR description is that this fix is for retry scenarios, not the happy path.

Important context: RETRY scenarios after PR #472 issues

Both PR #473 and #474 are related to PR #472 (LUKS data offset fix). The problem appears when:

First toggle-disk attempt fails (e.g., due to LUKS offset mismatch between nodes - PR fix(luks): use fixed data offset for consistent LUKS header size #472)
Storage layer and LUKS device are already created on the local node
User retries toggle-disk (or PR fix(toggle-disk): allow retry and cancellation of failed operations #475 retry mechanism kicks in)
On retry: LUKS device exists and is open, but res file still says disk none

Why your test works

In your happy path test:

First devMgrRun: res file has disk none, DRBD configured as diskless
Controller receives success, calls markDiskAdded() → removes DRBD_DISKLESS
Second devMgrRun: isDrbdDiskless=false → res file gets correct device path
Everything happens quickly → success

Why retry fails without this patch

In retry scenario:

First attempt: Storage/LUKS created, res file generated with disk none, then operation fails
DRBD_DISKLESS flag is NOT removed (operation didn't complete successfully)
Retry: LUKS device /dev/mapper/Linstor-Crypt-... already exists and is open
But isDrbdDiskless() still returns true → res file still says disk none
drbdadm adjust sees disk none but actual LUKS device is already there
State mismatch → potential issues with metadata creation (PR fix(drbd): create metadata during toggle-disk from diskless to diskful #474)

The fix

PR #473 checks for DISK_ADD_REQUESTED/DISK_ADDING flags in ConfFileBuilder.java:

If these flags are set AND dataDevice is available → use actual device path
This ensures res file reflects reality during retry scenarios

How to reproduce

Create a diskless DRBD+LUKS resource with replicas on nodes with different LUKS default offsets
Run toggle-disk → fails due to "Low.dev. smaller than requested DRBD-dev. size" (PR fix(luks): use fixed data offset for consistent LUKS header size #472 issue)
Apply PR fix(luks): use fixed data offset for consistent LUKS header size #472 fix (or ensure same LUKS offset)
Retry toggle-disk
Without PR fix(drbd): use actual device path in res file during toggle-disk #473: res file still has disk none even though LUKS device exists

Summary

You're right that the happy path works without this patch. The value is in:

Retry scenarios: res file reflects actual device state immediately
Consistency: no mismatch between res file and actual LUKS device
Works with PR fix(drbd): create metadata during toggle-disk from diskless to diskful #474: metadata creation needs correct device path in res file

If you prefer, we can mark this as low-priority improvement. The critical fixes are PR #474, #476, and #477.

Best regards

kvaps · 2026-04-03T20:58:35Z

Decided not to use these changes for now, closing. Will reopen if the issues arise again.

kvaps mentioned this pull request Jan 6, 2026

[linstor] Build linstor-server with custom patches cozystack/cozystack#1726

Merged

kvaps marked this pull request as ready for review January 6, 2026 09:13

ghernadi mentioned this pull request Jan 14, 2026

fix(drbd): create metadata during toggle-disk from diskless to diskful #474

Closed

5 tasks

kvaps closed this Apr 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(drbd): use actual device path in res file during toggle-disk#473

fix(drbd): use actual device path in res file during toggle-disk#473
kvaps wants to merge 1 commit intoLINBIT:masterfrom
kvaps:fix/toggle-disk-resfile

kvaps commented Jan 6, 2026 •

edited

Loading

Uh oh!

ghernadi commented Jan 14, 2026

Uh oh!

kvaps commented Jan 14, 2026 •

edited

Loading

Uh oh!

kvaps commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kvaps commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Test plan

Uh oh!

ghernadi commented Jan 14, 2026

Uh oh!

kvaps commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Important context: RETRY scenarios after PR #472 issues

Why your test works

Why retry fails without this patch

The fix

How to reproduce

Summary

Uh oh!

kvaps commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kvaps commented Jan 6, 2026 •

edited

Loading

kvaps commented Jan 14, 2026 •

edited

Loading