-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Storage pool monitor disconnect improvements #12398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 4.20
Are you sure you want to change the base?
Storage pool monitor disconnect improvements #12398
Conversation
- Removes unwanted synchronize as it blocks disconnect for all hosts on management server - Improves logs for the disconnect process
|
@blueorangutan package |
|
@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## 4.20 #12398 +/- ##
============================================
- Coverage 16.23% 16.23% -0.01%
- Complexity 13375 13376 +1
============================================
Files 5657 5657
Lines 498933 498946 +13
Branches 60553 60552 -1
============================================
+ Hits 81012 81014 +2
- Misses 408889 408901 +12
+ Partials 9032 9031 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 16313 |
vishesh92
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR enhances the storage pool monitor disconnect process by removing synchronization that blocks disconnects for all hosts and adding comprehensive logging and profiling.
Changes:
- Removed
synchronizedkeyword from processDisconnect methods to prevent blocking - Added detailed debug logging throughout the disconnect flow to track each step
- Implemented profiling to measure disconnect operation duration
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| long disconnectDuration = disconnectProfiler.getDurationInMillis() / 1000; | ||
| logger.debug("Finished disconnect with result {} from Storage Pool {} ({}) ({} of {}) for host: {}, duration: {} secs", disconnectResult, pool.getName(), pool.getUuid(), i, storagePoolHostsSize, host, disconnectDuration); |
Copilot
AI
Jan 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The profiling duration calculation uses integer division which will truncate to 0 for operations taking less than 1000 milliseconds. This makes the timing information less useful since most disconnect operations likely complete in under a second. Consider using getDurationInMillis() directly and logging in milliseconds, or convert to a double/float for fractional seconds.
| long disconnectDuration = disconnectProfiler.getDurationInMillis() / 1000; | |
| logger.debug("Finished disconnect with result {} from Storage Pool {} ({}) ({} of {}) for host: {}, duration: {} secs", disconnectResult, pool.getName(), pool.getUuid(), i, storagePoolHostsSize, host, disconnectDuration); | |
| long disconnectDurationMillis = disconnectProfiler.getDurationInMillis(); | |
| logger.debug("Finished disconnect with result {} from Storage Pool {} ({}) ({} of {}) for host: {}, duration: {} ms", disconnectResult, pool.getName(), pool.getUuid(), i, storagePoolHostsSize, host, disconnectDurationMillis); |
| logger.debug("Processing disconnect from Storage Pool {} ({} of {}) for host: {}", storagePoolHost.getPoolId(), i, storagePoolHostsSize, host); | ||
| StoragePoolVO pool = _poolDao.findById(storagePoolHost.getPoolId()); | ||
| if (pool == null) { | ||
| logger.debug("No Storage Pool found with id {} ({} of {}) for host: {}", storagePoolHost.getPoolId(), i, storagePoolHostsSize, host); | ||
| continue; | ||
| } | ||
|
|
||
| if (!pool.isShared()) { | ||
| logger.debug("Storage Pool {} ({}) ({} of {}) is not shared for host: {}, ignore disconnect", pool.getName(), pool.getUuid(), i, storagePoolHostsSize, host); | ||
| continue; | ||
| } | ||
|
|
||
| // Handle only PowerFlex pool for now, not to impact other pools behavior | ||
| if (pool.getPoolType() != StoragePoolType.PowerFlex) { | ||
| logger.debug("Storage Pool {} ({}) ({} of {}) is not of type {} for host: {}, ignore disconnect", pool.getName(), pool.getUuid(), i, storagePoolHostsSize, pool.getPoolType(), host); | ||
| continue; | ||
| } | ||
|
|
||
| logger.debug("Sending disconnect to Storage Pool {} ({}) ({} of {}) for host: {}", pool.getName(), pool.getUuid(), i, storagePoolHostsSize, host); | ||
| Profiler disconnectProfiler = new Profiler(); | ||
| try { | ||
| disconnectProfiler.start(); | ||
| _storageManager.disconnectHostFromSharedPool(host, pool); | ||
| } catch (Exception e) { | ||
| logger.error("Unable to disconnect host {} from storage pool {} due to {}", host, pool, e.toString()); | ||
| disconnectResult = false; | ||
| } finally { | ||
| disconnectProfiler.stop(); | ||
| long disconnectDuration = disconnectProfiler.getDurationInMillis() / 1000; | ||
| logger.debug("Finished disconnect with result {} from Storage Pool {} ({}) ({} of {}) for host: {}, duration: {} secs", disconnectResult, pool.getName(), pool.getUuid(), i, storagePoolHostsSize, host, disconnectDuration); |
Copilot
AI
Jan 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The loop index message format is inconsistent. The message shows "i of storagePoolHostsSize" which displays as "0 of 5", "1 of 5", etc. This is confusing because the index is 0-based while the total count is 1-based. Consider using "i+1" to display "1 of 5", "2 of 5" for better readability in log messages.
| try { | ||
| disconnectProfiler.start(); | ||
| _storageManager.disconnectHostFromSharedPool(host, pool); | ||
| } catch (Exception e) { | ||
| logger.error("Unable to disconnect host {} from storage pool {} due to {}", host, pool, e.toString()); | ||
| disconnectResult = false; | ||
| } finally { | ||
| disconnectProfiler.stop(); | ||
| long disconnectDuration = disconnectProfiler.getDurationInMillis() / 1000; | ||
| logger.debug("Finished disconnect with result {} from Storage Pool {} ({}) ({} of {}) for host: {}, duration: {} secs", disconnectResult, pool.getName(), pool.getUuid(), i, storagePoolHostsSize, host, disconnectDuration); | ||
| } |
Copilot
AI
Jan 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The disconnectResult variable logged in the finally block at line 207 does not accurately represent the result of the current disconnect operation. The variable is set to false only when an exception occurs, but it retains its previous value (from previous iterations or the initial true value) when the current operation succeeds. This makes the log message misleading. Consider using a local boolean variable to track the result of the current disconnect operation specifically.
| try { | |
| disconnectProfiler.start(); | |
| _storageManager.disconnectHostFromSharedPool(host, pool); | |
| } catch (Exception e) { | |
| logger.error("Unable to disconnect host {} from storage pool {} due to {}", host, pool, e.toString()); | |
| disconnectResult = false; | |
| } finally { | |
| disconnectProfiler.stop(); | |
| long disconnectDuration = disconnectProfiler.getDurationInMillis() / 1000; | |
| logger.debug("Finished disconnect with result {} from Storage Pool {} ({}) ({} of {}) for host: {}, duration: {} secs", disconnectResult, pool.getName(), pool.getUuid(), i, storagePoolHostsSize, host, disconnectDuration); | |
| } | |
| boolean currentDisconnectResult = true; | |
| try { | |
| disconnectProfiler.start(); | |
| _storageManager.disconnectHostFromSharedPool(host, pool); | |
| } catch (Exception e) { | |
| logger.error("Unable to disconnect host {} from storage pool {} due to {}", host, pool, e.toString()); | |
| currentDisconnectResult = false; | |
| } finally { | |
| disconnectProfiler.stop(); | |
| long disconnectDuration = disconnectProfiler.getDurationInMillis() / 1000; | |
| logger.debug("Finished disconnect with result {} from Storage Pool {} ({}) ({} of {}) for host: {}, duration: {} secs", currentDisconnectResult, pool.getName(), pool.getUuid(), i, storagePoolHostsSize, host, disconnectDuration); | |
| } | |
| if (!currentDisconnectResult) { | |
| disconnectResult = false; | |
| } |
|
|
||
| // Handle only PowerFlex pool for now, not to impact other pools behavior | ||
| if (pool.getPoolType() != StoragePoolType.PowerFlex) { | ||
| logger.debug("Storage Pool {} ({}) ({} of {}) is not of type {} for host: {}, ignore disconnect", pool.getName(), pool.getUuid(), i, storagePoolHostsSize, pool.getPoolType(), host); |
Copilot
AI
Jan 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The log message has an incorrect argument order. The message template shows "Storage Pool {} ({}) ({} of {}) is not of type {} for host: {}", but the arguments provided are: pool.getName(), pool.getUuid(), i, storagePoolHostsSize, pool.getPoolType(), host. The fifth placeholder expects the pool type that the pool is NOT, but pool.getPoolType() returns the actual type of the pool. The expected type (StoragePoolType.PowerFlex) should be included in the log message instead.
| logger.debug("Storage Pool {} ({}) ({} of {}) is not of type {} for host: {}, ignore disconnect", pool.getName(), pool.getUuid(), i, storagePoolHostsSize, pool.getPoolType(), host); | |
| logger.debug("Storage Pool {} ({}) ({} of {}) is not of type {} for host: {}, ignore disconnect", pool.getName(), pool.getUuid(), i, storagePoolHostsSize, StoragePoolType.PowerFlex, host); |
Description
The PRs improves the following in storage pool monitor disconnect process.
Types of changes
Feature/Enhancement Scale or Bug Severity
Feature/Enhancement Scale
Bug Severity
Screenshots (if appropriate):
How Has This Been Tested?
How did you try to break this feature and the system with this change?