Skip to content

stale GroupVersion discovery for ipam.miloapis.com/v1alpha1 causes namespace controller retry storm #31

Description

@kevwilliams

Summary

The milo-controller-manager namespace controller is retrying at ~5/sec in staging because ipam.miloapis.com/v1alpha1 appears in milo-apiserver discovery but is not backed by a working implementation in this environment.

Error

Every namespace deletion attempt in the namespace controller logs:

"err":"unable to retrieve the complete list of server APIs: ipam.miloapis.com/v1alpha1: stale GroupVersion discovery: ipam.miloapis.com/v1alpha1"
"Unhandled Error","err":"deletion of namespace {root organization-e2e-test-projects-org-1-7a465j} failed: unable to retrieve the complete list of server APIs: ipam.miloapis.com/v1alpha1: stale GroupVersion discovery: ipam.miloapis.com/v1alpha1"

Impact

  • milo-controller-manager namespace workqueue: ~5/sec retries
  • milo-apiserver open_api_v3_aggregation_controller: ~6.5/sec retries (3 pods × ~2.2/sec)
  • Every namespace being deleted (e.g. e2e test cleanup) fails and loops indefinitely
  • Cumulative ~37k retries/hour across the two controllers

Environment

  • Cluster: staging-infrastructure-control-plane
  • milo image: ghcr.io/milo-os/milo:v0.0.0-main-20260618-150757
  • IPAM deployed: No — no Deployment, no APIService object, no infra config for IPAM in staging

Investigation

  • All aggregator_unavailable_apiservice metrics report 0 — ipam.miloapis.com is not registered as an external aggregated APIService
  • milo-apiserver pods restarted today (15:21 UTC), so this is not an in-memory cache issue
  • ipam.miloapis.com/v1alpha1 appears to be natively registered in the milo binary but has no working storage/backing in staging

Likely Root Cause

ipam.miloapis.com/v1alpha1 is natively registered in the milo-apiserver as a built-in API group, but IPAM is not deployed in staging. This causes the group to appear in /apis discovery but fail when any client attempts to enumerate resources in that group.

Suggested Fix

One of:

  1. Remove the native ipam.miloapis.com/v1alpha1 registration from the milo binary if IPAM has been extracted as a standalone service
  2. Make the namespace controller tolerate stale/unavailable GroupVersions by skipping them rather than failing the entire namespace deletion
  3. Deploy IPAM to staging

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions