fix: [sc-301863] [AKS] Add node pool action is broken with "Automatic" mode#85
fix: [sc-301863] [AKS] Add node pool action is broken with "Automatic" mode#85amandineslx wants to merge 6 commits into
Conversation
pjestin-dku
left a comment
There was a problem hiding this comment.
I'm not sure about this way to fix the issue. This changes the behavior for both cluster creation and node pool addition.
While digging around I found this PR which added the "Automatic" mode, with a comment raising interesting issues: dc710a0#commitcomment-53684049
It seems to me we should never set Automatic as the mode (which is enforced here). However, from what the comment says, it looks like the first node pool was intended to have mode System, and the rest unset mode (i.e. User).
What do you think about moving this logic to the client, i.e. python-clusters/create-aks-cluster/cluster.py for cluster creation and python-runnables/add-node-pool/runnable.py for node pool addition?
I don't see a good way to keep the automatic mode either which means we are in for a breaking change where the users will have to decide whether the node pools should be system or user. Could a cluster be created without initial node pools? If yes, that may help our case. Bc then we can make the automatic mode always user in the macro since the cluster will have at least one node pool (system). |
pjestin-dku
left a comment
There was a problem hiding this comment.
LGTM.
The string Automatic is no longer passed as a literal mode when adding a node pool. We are also changing the behavior when creating a cluster, I think the new behavior will be more flexible 👍
I tested by creating clusters with various node pool configurations: as expected one of the node pools is created as "System" if there is no other.
I also tried the add node pool action: as expected the new node pool is "User" only if the input mode was "Automatic"
Note that in case we are creating a cluster with one node pool in mode "Automatic", then another node pool in mode "System", then both node pools become "System". I don't think think this is an issue.
thtrunck
left a comment
There was a problem hiding this comment.
The code for setting CriticalAddonsOnly=true:NoSchedule is weird we only do that is the mode is System.
You changed the resolution time (now in node_pool_builder.with_mode you passe the resolved mode instead of resolving later).
So this means that for a single automatic nodepool we will have the taint and we won't be able to do anything.
I say that with only looking at the code maybe I'm missing something but I'm provisioning an instance to actually test.
When creating a node pool on a cluster, the following applies:
Systemnode poolOn DSS side, we offer an
Automaticnode pool mode that should behave as:Systemnode pool in the cluster, the newAutomaticone will beUserSystemnode pool in the cluster, the newAutomaticone needs to beSystem