Skip to content

Restart VPC with clean-up not applying all LB rules#8765

Closed
SadiJr wants to merge 1 commit intoapache:4.18from
scclouds:fix-restart-vpc-not-applying-all-lb-rules
Closed

Restart VPC with clean-up not applying all LB rules#8765
SadiJr wants to merge 1 commit intoapache:4.18from
scclouds:fix-restart-vpc-not-applying-all-lb-rules

Conversation

@SadiJr
Copy link
Copy Markdown
Contributor

@SadiJr SadiJr commented Mar 8, 2024

Description

If a restart with clean-up is performed in a VPC that has different guest networks and load balancer rules in these networks, only one of the networks has its load balancer rules applied. This PR fixes this behavior.

Fixes #8745

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • build/CI

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

How Has This Been Tested?

It's tested in a local lab, following this steps:

  1. Create a VPC with two tiers (test1 and test2);
  2. Create a VM for each one of this tiers;
  3. Acquire one public IP for each one of this tiers;
  4. Create load balancer rules in each public IP for each VM
  5. Apply restart with clean-up in the VPC

Before, only the last tier had its rules applied. With this PR, all LB rules are applied.

List<LoadBalancerVO> lbs = null;
Long vpcId = guestNetwork.getVpcId();
if (vpcId != null) {
lbs = _loadBalancerDao.listByVpcIdAndScheme(vpcId, Scheme.Public);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LB rules will be applied multiple times (=number of vpc tiers). There might be exceptions as some guest nics are missing.

It can be applied when all guest nics are added to the VPC VR

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I understood exactly what you meant; since LB rules are being searched for by VPC ID, exceptions should not occur if there are missing NICs. In the worst case, there will be a rule added to VR of one network that does not exists.

Regarding applying the rules when all guest networks are added to VR, from what I saw of the current ACS workflow, the solution I presented corrects the problem, and I didn't find another part of the code where it can be solved besides this section. If you think that this solution could be applied to another section, could you please point to where that would be?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SadiJr
the method finalizeNetworkRulesForNetwork is called for each guest network when VPC VR is started

for (final Pair<Nic, Network> nicNtwk : guestNics) {
final Nic guestNic = nicNtwk.first();
final long guestNetworkId = guestNic.getNetworkId();
final AggregationControlCommand startCmd = new AggregationControlCommand(Action.Start, domainRouterVO.getInstanceName(), controlNic.getIPv4Address(), _routerControlHelper.getRouterIpInNetwork(
guestNetworkId, domainRouterVO.getId()));
cmds.addCommand(startCmd);
if (reprogramGuestNtwks) {
finalizeIpAssocForNetwork(cmds, domainRouterVO, provider, guestNetworkId, vlanMacAddress);
finalizeNetworkRulesForNetwork(cmds, domainRouterVO, provider, guestNetworkId);
finalizeMonitorService(cmds, profile, domainRouterVO, provider, guestNetworkId, true, routerHealthCheckConfig);
}
finalizeUserDataAndDhcpOnStart(cmds, domainRouterVO, provider, guestNetworkId);
final AggregationControlCommand finishCmd = new AggregationControlCommand(Action.Finish, domainRouterVO.getInstanceName(), controlNic.getIPv4Address(), _routerControlHelper.getRouterIpInNetwork(
guestNetworkId, domainRouterVO.getId()));
cmds.addCommand(finishCmd);
}

If you search for StartCommand in management server logs, you will see multiple LoadBalancerConfigCommand with same parameters.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, the method is indeed called for each guest network, but the LoadBalancerConfigCommand command, from what I've analyzed in the logs, is not the same. Instead, for each guest network that calls such a method, only the load balancer rules for that network are passed in the LoadBalancerConfigCommand command. This is the reason why, in a VPC with multiple guest networks, each of which has its own load balancer rules, only the rules of the last implemented network are applied.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know what caused the issue and why you made the changes.
What I meant is, the process can be optimized. Only one LoadBalancerConfigCommand (with LB of all tiers) is needed, maybe after all other rules are applied.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can understand your point of view, however, considering the criticality and the time it would take me to implement and test your proposal, I'll continue with the current implementation and create an issue to optimize such a process in the future, using your idea. What do you think?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SadiJr
if you do not mind, I can create a new PR for it tomorrow

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, feel free to create an alternative PR, as it allows the community to compare the solutions and choose the best one.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SadiJr if you do not mind, I can create a new PR for it tomorrow

@weizhouapache , would this be an alternative or an addition?

Have you created it?

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 8, 2024

Codecov Report

❌ Patch coverage is 0% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 13.16%. Comparing base (223a9b8) to head (706f3f6).
⚠️ Report is 78 commits behind head on 4.18.

Files with missing lines Patch % Lines
...ork/router/VirtualNetworkApplianceManagerImpl.java 0.00% 6 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               4.18    #8765      +/-   ##
============================================
- Coverage     13.16%   13.16%   -0.01%     
  Complexity     9205     9205              
============================================
  Files          2724     2724              
  Lines        258149   258153       +4     
  Branches      40235    40236       +1     
============================================
  Hits          33987    33987              
- Misses       219856   219860       +4     
  Partials       4306     4306              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@sureshanaparti sureshanaparti added this to the 4.18.2.0 milestone Mar 11, 2024
@JoaoJandre JoaoJandre added the Severity:Critical Critical bug label Mar 15, 2024
@yadvr yadvr requested a review from JoaoJandre April 4, 2024 09:26
@yadvr
Copy link
Copy Markdown
Member

yadvr commented Apr 4, 2024

This is marked critical, is this a must-have for 4.18.2.0 @JoaoJandre @weizhouapache @DaanHoogland ?

@weizhouapache
Copy link
Copy Markdown
Member

This is marked critical, is this a must-have for 4.18.2.0 @JoaoJandre @weizhouapache @DaanHoogland ?

@rohityadavcloud
I will create a PR soon

@JoaoJandre
Copy link
Copy Markdown
Contributor

This is marked critical, is this a must-have for 4.18.2.0 @JoaoJandre @weizhouapache @DaanHoogland ?

Yes, this is a must-have for 4.18.2.0. Now with #8881 we'll have to discuss which solution is best.

@DaanHoogland
Copy link
Copy Markdown
Contributor

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@DaanHoogland a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link
Copy Markdown

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 9171

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants