Skip to content

Retry DRBD adjust after stale bitmap attach failure#491

Open
kvaps wants to merge 1 commit intoLINBIT:masterfrom
kvaps:kvaps/retry-adjust-after-stale-bitmap
Open

Retry DRBD adjust after stale bitmap attach failure#491
kvaps wants to merge 1 commit intoLINBIT:masterfrom
kvaps:kvaps/retry-adjust-after-stale-bitmap

Conversation

@kvaps
Copy link
Copy Markdown

@kvaps kvaps commented Apr 3, 2026

Summary

When drbdadm adjust fails during local attach with:

  • already has a bitmap, this should not happen

it means DRBD still has stale local bitmap state for the target minor. In that case LINSTOR currently aborts the adjust and leaves the resource diskless even though peers may still be healthy.

This change teaches the satellite to:

  • detect that specific attach failure
  • extract the affected minor from stderr
  • run drbdsetup detach <minor> with a --force fallback
  • retry drbdadm adjust once

Why

I hit this while recovering DRBD resources after a cluster incident. In practice this looked like an unintentional diskless resource in LINSTOR while drbdadm status still showed a healthy Primary with peer-disk:UpToDate on other nodes.

The detach + retry path was enough to resynchronize LINSTOR with the actual DRBD device state and allow the local disk to be reattached.

Validation

  • reproduced against LINSTOR 1.33.1
  • verified on a live cluster during recovery
  • added unit coverage for bitmap leak detection / minor extraction

@kvaps
Copy link
Copy Markdown
Author

kvaps commented Apr 3, 2026

We've integrated this change into Cozystack as part of cozystack/cozystack#2331

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant