Skip to content

Connector EnvoyPatchPolicy stuck ResourceNotFound for HTTPS listeners with unissued certs #183

Description

@scotwells

Symptom

Connector EnvoyPatchPolicy resources (named connector-<httpproxy>) get stuck Programmed=False with reason ResourceNotFound whenever the gateway has a custom-hostname HTTPS listener whose TLS certificate has not yet been issued.

The EPP status looks like:

Programmed=False  reason=ResourceNotFound
message: Unable to find xds resources: .../https-hostname-0

Production example: connector-tunnel-29hqn for hostname agent.lkms.app, where the cert has been pending for 62 days (ACME/DNS not yet configured).

Impact

  • The connector EPP stays in a permanent failing state, producing continuous failure noise. It only recovers if/when the certificate issues, so a hostname with pending DNS/ACME leaves the EPP failing indefinitely.
  • The connector's intended "Tunnel not online" 503 fallback route never attaches for that listener.
  • Note: this does NOT block the shared xDS snapshot for other gateways — Envoy Gateway skips the unresolvable patch on a per-patch basis — but it is persistent, misleading failure noise.

Root cause

The connector reconciler emits a RouteConfiguration patch for every HTTPS listener on the gateway with no per-listener certificate/readiness check:

  • gatewayHTTPSRouteConfigNames (internal/controller/connector_routing_compiler.go:201) iterates gateway.Spec.Listeners and filters only on Protocol == HTTPSProtocolType.
  • The only gate, in reconcileConnectorEnvoyPatchPolicy (internal/controller/httpproxy_controller.go:1500-1511), checks just the single default-https listener's Programmed condition. default-https always passes because it uses the shared wildcard cert.

So for a custom-hostname listener (e.g. https-hostname-0) whose cert hasn't issued, NSO still emits a patch targeting RouteConfiguration <ns>/<gateway>/https-hostname-0. Envoy Gateway never materializes that RouteConfiguration (no TLS secret -> listener InvalidCertificateRef -> not Programmed -> no RouteConfiguration), so the patch is unresolvable.

NSO already computes per-listener cert readiness in buildCertificateStatuses (internal/controller/httpproxy_controller.go:1137-1254, cert name <gateway>-<listener>, isCertificateReady), but only uses it for HTTPProxy status — never for the EPP path.

Suggested fix

Gate the connector RouteConfiguration patches per listener: only emit a patch for an HTTPS listener when that listener's cert-manager Certificate is Ready AND the listener is Programmed. The shared-wildcard default-https listener should remain always-eligible.

This is the same Unable to find xds resources failure mode that was fixed for TrafficProtectionPolicy in #107 (helpers checkHTTPSListenerCertificatesReady / checkHTTPSListenersProgrammed); that fix was never ported to the connector EPP path. #112 added the current partial gate (default-https only), which under-fixes this case. The fix should reuse the #107 helpers but gate per-listener rather than all-or-nothing, so default-https keeps working while https-hostname-0 waits for its cert.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions