fix: Add chart control for updateStrategy to brokers and proxies#668
fix: Add chart control for updateStrategy to brokers and proxies#668lhotari merged 1 commit intoapache:masterfrom
updateStrategy to brokers and proxies#668Conversation
A user can control the `updateStrategy` for the pods of bookies and zookeeper. However, the values for brokers and proxies is hardcoded. Being able to control this value via the Helm chart is crucial to being able to do smooth chart upgrades to a fully-running cluster. For example, setting the strategy to `OnDelete` would allow a user to control which order the pods are restarted after an upgrade. Fixes: apache#667
|
@lhotari you're very active in the pulsar repos. I was wondering what you thought of this, and if you had any thoughts about how to make upgrades more controlled in the future (when using helm)? |
In the issue, you mentioned "Not having control restarts all components at once which renders a fully-operational cluster in a bad error state." I think that wouldn't be expected when rolling restarts are performed, and it's a bug. Please share more details of what type of bad error state you end up in. I know that there are some bugs that could cause this. Sharing the Pulsar version would help to see if there's a fix in newer versions. The client version could also matter in some cases. Sharing that would be helpful too. In general, it would be useful to perform upgrades "slowly" so that each set of components is handled separately and upgraded before moving on to the next ones. For example, upgrading ZooKeeper, then BookKeeper, and finally Brokers & Proxies. The order doesn't matter that much since newer versions should always be able to talk to older component versions. Even without handling restarts separately, it shouldn't result in the cluster getting into a bad error state unless the high load causes the system to collapse when there are a lot of component restarts at once. When the Pulsar version is upgraded, one possible solution is to manage the images for the different components separately in values.yaml and not rely on the default that changes the image for all components at once. In that case, one would perform multiple Helm deployments while upgrading. This would work for cases where only the Pulsar image is upgraded. However, if the chart is upgraded, there could be many changes that impact multiple different components and cause them to restart. This is just one thought on some solutions. It would be great if you could contribute a section to the README.md file about handling upgrades in a controlled way and what problems it resolves. One known issue with brokers in a full rolling restart is that there's also a lot of shuffling due to load balancing. Bundles get moved across brokers resulting in disruptions in traffic for producers and consumers until the cluster stabilizes itself. This mainly matters at very high throughput / workload when resources aren't heavily over-provisioned. |
|
FYI StreamNative cloud has a feature "Graceful Cluster Rollout", https://docs.streamnative.io/private-cloud/v2/configure-private-cloud/advanced/private-cloud-graceful-cluster-rollout . That relies on PIP-192 and PIP-307 besides the StreamNative Operator (commercial product) orchestration. |
|
@lhotari thanks for your quick attention here. I'd be happy to provide more visibility as I learn more about Pulsar--this is my first production implementation of it, and we're still in the tweaking stage, but loving it so far (as compared to kafka). Here are the versions I'm currently using: I might have generalized too much about the 'bad state' of the system. What I've generally seen is the fire-and-forget upgrade where multiple components restart at the same time. When that happens, it's more of a stampede problem when you have thousands of topics and busy producers. I've seen it blow up ZK with a flood of lookups, brokers crashing because they can't handle bundle handoffs, and heat on the bookies when the ensembles lose member nodes. All of that together just makes for a bad situation--so far, I've just been able to turn off Pulsar to recover from these situations, as I'm still tuning production on a trial basis.
This PR here is a crude attempt at doing this, but perhaps a more elegant way of rolling the components out is warranted. I agree that the 'slow upgrade' approach feels better in terms of control, and you've offered up a few good suggestions in that area.
Yes, this is it, primarily. ZK might also a preferred restart order based on the leader, but I think that's a minor thing compared to the traffic stampedes that happen.
I'm happy to contribute anything I can as I learn more about the system's behavior through my own testing. Would you prefer such a contribution prior to any official upgrade controls? It can be just what's worked for me based on this PRs change?
Yes, 100% true. At first, I wasn't able to roll brokers without experiencing super high latency on my publishing tier, which was disasterous. I was able to mitigate this situation by doing a few things:
I'm not sure which (or all) of these helped most, but that fixed my stampede problems with rolling brokers. The last broker to roll is mostly idle when the rolling is done, but then the |
Fixes: #667
Motivation
A user can control the
updateStrategyfor the pods of bookies and zookeeper. However, the values for brokers and proxies is hardcoded. Being able to control this value via the Helm chart is crucial to being able to do smooth chart upgrades to a fully-running cluster. For example, setting the strategy toOnDeletewould allow a user to control which order the pods are restarted after an upgrade.Modifications
For Broker and Proxy templates, set a default, but allow overrides from
.Values.xxx.updateStrategyVerifying this change