fix: disable APF feature flag to prevent readyz-blocking informers#30
fix: disable APF feature flag to prevent readyz-blocking informers#30scotwells wants to merge 1 commit into
Conversation
|
Closing — this fix is not needed. The APF informers only failed because |
|
Re-opening. Neither the quota nor the activity service has a NetworkPolicy, so they can't confirm the |
1799a10 to
f6f7489
Compare
IPAM is a delegating aggregated apiserver: API Priority and Fairness is enforced by the main kube-apiserver, not here. With APF enabled, FeatureOptions.ApplyTo calls utilflowcontrol.New(), which registers FlowSchema and PriorityLevelConfiguration event handlers on the shared informer factory. Those informers are then counted by the informer-sync readyz check and never reliably sync (they require list/watch on flowcontrol.apiserver.k8s.io against the host apiserver), so /readyz returns 500 forever and the pod never becomes Ready. The aggregation layer then never registers the APIService. The previous fix nil-ed genericConfig.FlowControl after ApplyTo, but that is too late: the informers are already registered by the time ApplyTo returns. Set EnablePriorityAndFairness=false before ApplyTo so utilflowcontrol.New() is never called and the informers are never registered, and drop the now-redundant FlowControl=nil line. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
bb60647 to
d69270c
Compare
|
Rebased onto current What the fix does: sets On the NetworkPolicy hypothesis (it's a red herring): Merging this to |
|
Added a second commit for the ingress side, which is the reason the APIService is still unavailable even though the pods are now Ready. The aggregation front-proxy is in a different namespace than the NetworkPolicy assumes. On Datum staging the apiserver that hosts and proxies Fix adds an ingress rule allowing PR now has two commits: (1) disable APF (readyz), (2) allow datum-system ingress (APIService reachability). Both needed for staging to come fully Available. |
dc6fc64 to
d69270c
Compare
Problem
The IPAM apiserver pods are stuck
0/1 Readyin staging. The readiness probe returns HTTP 500 indefinitely:Why the previous fix (#29) didn't work
PR #29 moved
genericConfig.FlowControl = nilto afterApplyTo, reasoning thatApplyTore-initializes the field. That was correct but incomplete.The real problem:
FeatureOptions.ApplyTocallsutilflowcontrol.New(informers, ...), which registersFlowSchemaandPriorityLevelConfigurationevent handlers directly on theSharedInformerFactory. SettingFlowControl = nilafterward removes the controller reference but does nothing to the factory — those informers remain registered and appear in theinformer-syncreadyz check, where they block readyz because the IPAM apiserver has noflowcontrol.apiserver.k8s.ioaccess.Fix
Set
EnablePriorityAndFairness = falseonRecommendedOptions.FeaturesinNewIPAMServerOptions(), beforeApplyTois ever called. This causesFeatureOptions.ApplyToto skip theutilflowcontrol.New()call entirely — the informers are never registered, and readyz is unblocked.The now-redundant
genericConfig.FlowControl = nilis removed.🤖 Generated with Claude Code