Skip to content

CLOUDSTACK-9473: storage pool capacity check when volume is resized or migrated#2829

Merged
yadvr merged 2 commits into
apache:4.11from
shapeblue:capacity-calculate-for-migrate-resize
Sep 7, 2018
Merged

CLOUDSTACK-9473: storage pool capacity check when volume is resized or migrated#2829
yadvr merged 2 commits into
apache:4.11from
shapeblue:capacity-calculate-for-migrate-resize

Conversation

@yadvr
Copy link
Copy Markdown
Member

@yadvr yadvr commented Aug 30, 2018

This was previously opened as PR but prematurely closed:
#1931
#1932

Issue: https://issues.apache.org/jira/browse/CLOUDSTACK-9473

Storage pool checker is not being called on resize and migrate volume.
This may lead to allocated percentage of storage above 100%.

Setup:
1 VMware cluster with 2 Hosts.

Executed Steps:

Applied the following global settings:
storage.overprovisioning.factor = 1
pool.storage.allocated.capacity.disablethreshold = 1
pool.storage.capacity.disablethreshold = 1
Restarted management server
Executed Resize and migrate pool and Observed that Storage pool checker is not performed on resizeVolume and migrateVolume.
Result:
Root cause analysis shows storage pool checker is not called when doing migration and resizing.

Signed-off-by: Rohit Yadav rohit.yadav@shapeblue.com

Description

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

GitHub Issue/PRs

Screenshots (if appropriate):

How Has This Been Tested?

Checklist:

  • I have read the CONTRIBUTING document.
  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
    Testing
  • I have added tests to cover my changes.
  • All relevant new and existing integration tests have passed.
  • A full integration testsuite with all test that can run on my environment has passed.

@yadvr yadvr added this to the 4.11.2.0 milestone Aug 30, 2018
@yadvr yadvr requested review from DaanHoogland and rafaelweingartner and removed request for rafaelweingartner August 30, 2018 08:22
…r migrated

Storage pool checker is not being called on resize and migrate volume.
This may lead to allocated percentage of storage above 100%.

Setup:
1 VMware cluster with 2 Hosts.

Executed Steps:

Applied the following global settings:
storage.overprovisioning.factor = 1
pool.storage.allocated.capacity.disablethreshold = 1
pool.storage.capacity.disablethreshold = 1
Restarted management server
Executed Resize and migrate pool and Observed that Storage pool checker is not performed on resizeVolume and migrateVolume.
Result:
Root cause analysis shows storage pool checker is not called when doing migration and resizing.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
@yadvr yadvr force-pushed the capacity-calculate-for-migrate-resize branch from 4860e33 to c2a3410 Compare August 30, 2018 08:34
@yadvr yadvr added the type:bug label Aug 30, 2018
@apache apache deleted a comment from blueorangutan Aug 30, 2018
@yadvr
Copy link
Copy Markdown
Member Author

yadvr commented Aug 30, 2018

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link
Copy Markdown

Packaging result: ✔centos6 ✔centos7 ✔debian. JID-2274

@yadvr
Copy link
Copy Markdown
Member Author

yadvr commented Aug 30, 2018

@blueorangutan test matrix

@blueorangutan
Copy link
Copy Markdown

@rhtyd a Trillian-Jenkins matrix job (centos6 mgmt + xs71, centos7 mgmt + vmware65, centos7 mgmt + kvmcentos7) has been kicked to run smoke tests

if (!checkUsagedSpace(pool)) {
return false;
}
if (s_logger.isDebugEnabled()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this if really needed? Can't you just log to debug and let the filter of log4j handle it?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If log4j does not have debug enabled, then the string concat operation will still be performed by the JVM only when the string is passed to the debug() method it will be ignored. So, it depends if we can afford the CPU/time spent on the operation.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can afford this tiny bit of extra CPU.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think as a matter of principle we should alwas add the check. You never no for what reason someone will alter the message. I like using String.format to construct the message, btw, and then pass the single resulting string to the logger

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What principle?

I also use String.format, and I do not see the problem.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"bla" + "blob" is a costly operation, basically to memcpy()s. String.format() as well in comparison. We don't want those operations performed if we don't end up using the result in a log statement. so the boolean check potentially saves us a lot of cpu cycles, especially in the debug() and trace() cases.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I do know that. However, we should ask ourselves, if the addition of a if conditions (it will clutter the code) is worth the CPU cycles. For me, I would not trade readability for this small CPU cost in a web system (ACS is a at the end of the day a Java web application).

if (!checkUsagedSpace(pool)) {
return false;
}
if (s_logger.isDebugEnabled()) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can afford this tiny bit of extra CPU.

Comment thread server/src/com/cloud/storage/StorageManagerImpl.java

s_logger.debug("Checking pool: " + pool.getId() + " for volume allocation " + volumes.toString() + ", maxSize : " + totalOverProvCapacity + ", totalAllocatedSize : "
+ allocatedSizeWithTemplate + ", askingSize : " + totalAskingSize + ", allocated disable threshold: " + storageAllocatedThreshold);
if (s_logger.isDebugEnabled()) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why checking before logging?

We are removing this from other places. Let's remove conditionals from our code base.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please let's always check before logging any lower priority (below error)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still see no real benefits of adding this log level checks in our context. On the other hand, I do see problems with readability of the code.

I cannot object though, if you want to proceed with this ;)

StoragePoolVO storagePool = _storagePoolDao.findById(volume.getPoolId());
boolean isManaged = storagePool.isManaged();

List<Volume> volumes = new ArrayList<Volume>();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you using this variable?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't think so and also these methods here are to huge for my puny brain

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My brain only works with up to 10 lines of code methods. Anything else is just a blur.

I think the code is ok; we just need to check if this variable is used; if it is not, please remove it.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed, unused code.

Copy link
Copy Markdown
Contributor

@DaanHoogland DaanHoogland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the priciple and also the functional bits of code and the extra log-level checks look good. There are some bits (volumes list) that I don't understand/are not needed and a full integration test awaiting 👍

@blueorangutan
Copy link
Copy Markdown

Trillian test result (tid-2975)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 27836 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr2829-t2975-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_affinity_groups_projects.py
Intermittent failure detected: /marvin/tests/smoke/test_affinity_groups.py
Intermittent failure detected: /marvin/tests/smoke/test_deploy_virtio_scsi_vm.py
Intermittent failure detected: /marvin/tests/smoke/test_privategw_acl.py
Intermittent failure detected: /marvin/tests/smoke/test_vpc_redundant.py
Smoke tests completed. 64 look OK, 3 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
test_DeployVmAntiAffinityGroup_in_project Error 38.33 test_affinity_groups_projects.py
test_DeployVmAntiAffinityGroup Error 53.52 test_affinity_groups.py
test_03_vpc_privategw_restart_vpc_cleanup Failure 1213.67 test_privategw_acl.py

@blueorangutan
Copy link
Copy Markdown

Trillian test result (tid-2970)
Environment: xenserver-71 (x2), Advanced Networking with Mgmt server 7
Total time taken: 31650 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr2829-t2970-xenserver-71.zip
Intermittent failure detected: /marvin/tests/smoke/test_deploy_virtio_scsi_vm.py
Intermittent failure detected: /marvin/tests/smoke/test_scale_vm.py
Smoke tests completed. 65 look OK, 2 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
ContextSuite context=TestDeployVirtioSCSIVM>:teardown Error 0.00 test_deploy_virtio_scsi_vm.py
test_01_scale_vm Error 19.43 test_scale_vm.py

StoragePoolVO storagePool = _storagePoolDao.findById(volume.getPoolId());
boolean isManaged = storagePool.isManaged();

List<Volume> volumes = new ArrayList<Volume>();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My brain only works with up to 10 lines of code methods. Anything else is just a blur.

I think the code is ok; we just need to check if this variable is used; if it is not, please remove it.

Copy link
Copy Markdown
Contributor

@borisstoyanov borisstoyanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
@yadvr
Copy link
Copy Markdown
Member Author

yadvr commented Sep 4, 2018

Packaging failed due to backend error, rekicking.
@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@apache apache deleted a comment from blueorangutan Sep 4, 2018
@apache apache deleted a comment from blueorangutan Sep 4, 2018
@apache apache deleted a comment from blueorangutan Sep 4, 2018
@blueorangutan
Copy link
Copy Markdown

Packaging result: ✖centos6 ✖centos7 ✖debian. JID-2284

@blueorangutan
Copy link
Copy Markdown

Packaging result: ✖centos6 ✖centos7 ✖debian. JID-2285

@blueorangutan
Copy link
Copy Markdown

Packaging result: ✖centos6 ✔centos7 ✔debian. JID-2286

@yadvr
Copy link
Copy Markdown
Member Author

yadvr commented Sep 4, 2018

@blueorangutan test

@blueorangutan
Copy link
Copy Markdown

@rhtyd a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@yadvr
Copy link
Copy Markdown
Member Author

yadvr commented Sep 5, 2018

@blueorangutan test

@blueorangutan
Copy link
Copy Markdown

@rhtyd a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link
Copy Markdown

Trillian test result (tid-2987)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 23411 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr2829-t2987-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_privategw_acl.py
Intermittent failure detected: /marvin/tests/smoke/test_vpc_vpn.py
Intermittent failure detected: /marvin/tests/smoke/test_hostha_kvm.py
Smoke tests completed. 65 look OK, 2 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
test_03_vpc_privategw_restart_vpc_cleanup Failure 1134.77 test_privategw_acl.py
test_hostha_enable_ha_when_host_in_maintenance Error 3.50 test_hostha_kvm.py

@yadvr yadvr merged commit 2ab3976 into apache:4.11 Sep 7, 2018
@yadvr
Copy link
Copy Markdown
Member Author

yadvr commented Sep 7, 2018

Test lgtm

bernardodemarco pushed a commit to scclouds/cloudstack that referenced this pull request Jul 16, 2025
…louds'

Normalização dos valores das configurações globais relacionadas à deleção em _batch_ de registros do banco de dados

Closes apache#2829

See merge request scclouds/scclouds!1226
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants