Skip to content

Latest commit

 

History

History
572 lines (438 loc) · 28.4 KB

File metadata and controls

572 lines (438 loc) · 28.4 KB

Component System

The component package provides a structured way to manage logical features in a Kubernetes operator by grouping related resources into Components.

A Component acts as a single behavioral unit: it reconciles multiple resources, manages their shared lifecycle, and reports their aggregate health through one condition on the owner CRD.

Table of Contents

Building a Component

Components are constructed through a builder. The builder collects resource registrations, configuration, and lifecycle flags, then produces an immutable Component ready for reconciliation.

comp, err := component.NewComponentBuilder().
    WithName("web-interface").
    WithConditionType("WebInterfaceReady").
    WithFeatureGate(webFeature).                                     // optional: disable to remove all resources
    WithPrerequisite(component.DependsOn("DatabaseReady")).   // optional: wait for another component
    WithResource(deployment, component.ResourceOptions{}).
    WithResource(configMap, component.ResourceOptions{ReadOnly: true}).
    WithResource(oldService, component.ResourceOptions{Delete: true}).
    WithGracePeriod(5 * time.Minute).
    Suspend(owner.Spec.Suspended).
    Build()
if err != nil {
    return err
}

Resource Registration Options

Each resource is registered with a ResourceOptions struct that controls how the component interacts with it:

Option Behavior
ResourceOptions{} (default) Managed: created or updated; health contributes to condition
ResourceOptions{ReadOnly: true} Read-only: fetched but never modified; health still contributes
ResourceOptions{Delete: true} Delete-only: removed from the cluster if present; does not contribute to health
ResourceOptions{ParticipationMode: ParticipationModeAuxiliary} The resource's health does not contribute to the component condition. The component can become Ready regardless of this resource's state. Exception: a blocked guard always contributes to the condition regardless of participation mode, because it halts the entire reconciliation pipeline
ResourceOptions{SuppressGraceInconsistencyWarning: true} Suppresses the warning log emitted when the resource's grace handler returns Healthy while its convergence handler returns non-healthy. Use this when the inconsistency is intentional (e.g., a custom grace handler that deliberately reports Healthy for a resource that has not fully converged)

Building Resource Options with Feature Gating

When a resource's lifecycle depends on a feature gate or runtime conditions, use ResourceOptionsBuilder to construct the options declaratively. The builder integrates with the feature system so that entire resources can be conditionally created or deleted based on feature state.

opts, err := component.NewResourceOptionsBuilder().
    WithFeatureGate(metricsFeature).
    Auxiliary().
    Build()
if err != nil {
    return err
}

builder.WithResource(exporterService, opts)

The builder evaluates all conditions at Build() time and produces a plain ResourceOptions value. The WithResource signature is unchanged.

Methods:

Method Effect
WithFeatureGate(f feature.Gate) Gates the resource on a feature. When disabled, the resource is deleted.
When(truth bool) Adds a boolean condition (AND logic). If any condition is false, the resource is deleted. Calls are additive.
Auxiliary() Sets participation mode to Auxiliary (resource does not affect component health).
ReadOnly() Marks the resource as read-only. If the resource is also gated by a disabled feature, deletion takes precedence over read-only.

For the common case of gating a resource on a single feature, use the convenience function:

opts, err := component.ResourceOptionsFor(tracingFeature)

Resolution rules:

  1. If the feature is non-nil and evaluates to disabled, the resource is deleted.
  2. If any When condition evaluates to false, the resource is deleted.
  3. Deletion takes precedence over read-only mode.
  4. Participation mode is preserved regardless of deletion state.

Example: mixed feature-gated and static resources:

tracingOpts, err := component.ResourceOptionsFor(
    feature.NewVersionGate(owner.Spec.Version, nil).When(owner.Spec.TracingEnabled),
)
if err != nil {
    return err
}

comp, err := component.NewComponentBuilder().
    WithName("api-server").
    WithConditionType("ApiServerReady").
    WithResource(apiDeployment, component.ResourceOptions{}).
    WithResource(jaegerSidecar, tracingOpts).
    Build()

When TracingEnabled is true, the Jaeger sidecar is created and managed. When false, it is deleted from the cluster.

Component Feature Gates

A component-level feature gate controls whether the component is active. When the gate is disabled, the component deletes all of its resources and reports a True condition with reason Disabled. When enabled (or not set), the component reconciles normally.

comp, err := component.NewComponentBuilder().
    WithName("monitoring-sidecar").
    WithConditionType("MonitoringReady").
    WithFeatureGate(monitoringFeature).
    WithResource(exporterDeployment, component.ResourceOptions{}).
    WithResource(exporterService, component.ResourceOptions{}).
    Suspend(owner.Spec.Suspended).
    Build()

A disabled feature gate takes precedence over suspension. If the gate is disabled and the component is also marked suspended, the component is treated as disabled (resources are deleted), not suspended.

The condition when the gate is disabled:

type: MonitoringReady
status: "True"
reason: Disabled
message: "Component is disabled."

The True status follows the convention that True means "in its expected state", consistent with how a Suspended component also reports True.

Prerequisites

Prerequisites are initialization barriers that prevent a component from reconciling until a condition is met. Unlike resource-level guards, prerequisites are evaluated only while the component's condition reason indicates it has not yet proceeded past initialization. The barrier remains active while the condition reason is Unknown, PrerequisiteNotMet, Disabled, or FeatureGateError. Once the reason changes to any other value, the barrier is permanently passed and the prerequisite is never re-evaluated.

This makes prerequisites suitable for expressing startup dependencies between components. If a dependency later becomes unhealthy, the dependent component continues to reconcile its own resources. Prerequisites answer the question "can this component be created?", not "should this component keep running?".

Registering Prerequisites

Prerequisites are registered on the component builder using WithPrerequisite. Multiple prerequisites can be registered; all must be satisfied before the component proceeds.

comp, err := component.NewComponentBuilder().
    WithName("api-server").
    WithConditionType("ApiServerReady").
    WithPrerequisite(component.DependsOn("DatabaseReady")).
    WithPrerequisite(component.DependsOn("CacheReady")).
    WithResource(apiDeployment, component.ResourceOptions{}).
    WithResource(apiService, component.ResourceOptions{}).
    Suspend(owner.Spec.Suspended).
    Build()

The built-in DependsOn helper checks whether a named condition on the owner object has Status: True. The owner is read from the ReconcileContext passed to Check, so no cluster reads are performed.

For custom logic, implement the Prerequisite interface:

type Prerequisite interface {
    Check(rec ReconcileContext) (PrerequisiteResult, error)
}

Prerequisite Behavior

  • Prerequisites are evaluated before any resources are reconciled or suspended.
  • The barrier is considered active when the component's condition reason is Unknown, PrerequisiteNotMet, Disabled, or FeatureGateError. Any other reason means the component has proceeded past initialization and the barrier is permanently passed.
  • While the barrier is active, suspension is a no-op. No resources exist to suspend.
  • A feature gate check runs before the prerequisite check. If the gate is disabled, prerequisites are not evaluated.
  • Prerequisites are evaluated in registration order. The first unmet prerequisite short-circuits the check.
  • A prerequisite error sets the component condition to False with reason PrerequisiteNotMet.

Status Reporting

A blocked prerequisite produces a condition like:

type: ApiServerReady
status: "False"
reason: PrerequisiteNotMet
message:
  'Prerequisite not met: waiting for condition "DatabaseReady" to become True (currently False: Database is still
  creating resources)'

Reconciliation Lifecycle

comp.Reconcile(ctx, recCtx) runs a multi-phase process on every call:

Phase 1: Feature gate check. If a feature gate is set and disabled, all resources managed by the component are deleted and the condition is set to True/Disabled. No further processing occurs.

Phase 2: Prerequisite check. If prerequisites are registered and the initialization barrier has not yet been passed (condition reason is Unknown, PrerequisiteNotMet, Disabled, or FeatureGateError), all prerequisites are evaluated. If any prerequisite is not met, the condition is set to False/PrerequisiteNotMet and no resources are reconciled or suspended.

Phase 3: Suspension check. If the component is marked suspended, it calls Suspend() on all managed resources that support suspension (create/update resources, not read-only ones), updates the condition, then processes any pending deletions and returns. The remaining phases are skipped.

Phase 4: Resource reconciliation. All non-delete resources are processed sequentially in registration order, regardless of whether they are managed or read-only. For each resource:

  1. If the resource has a guard, the guard is evaluated first. If blocked, the resource and all subsequent resources are skipped.
  2. The resource is either applied to the cluster (managed) or fetched from it (read-only). Managed resources use Server-Side Apply and get a controller owner reference pointing to the owner CRD, unless the resource is cluster-scoped and the owner is namespace-scoped (see Cluster-Scoped Resources).
  3. If the resource implements DataExtractable, its data extractors run immediately. This makes extracted data available to subsequent resources' guards and mutations within the same reconciliation cycle.

This means a read-only resource registered before a managed resource can extract data that feeds into the managed resource's guard or mutations.

Phase 5: Status aggregation and condition update. The health of each resource is collected, the grace period is consulted, and a single aggregate condition is written to the owner object's status.

Phase 6: Resource deletion. Resources registered for deletion are removed from the cluster.

Cluster-Scoped Resources

When a component manages cluster-scoped resources (e.g., ClusterRole, PersistentVolume) and the owner CRD is namespace-scoped, the framework automatically skips setting a controller owner reference on those resources. This is a Kubernetes API constraint: a namespace-scoped object cannot own a cluster-scoped object.

The scope of both the owner and the resource is determined at reconcile time using the cluster's REST mapper. No configuration is needed; the framework detects the incompatibility and logs an info-level message.

Garbage collection caveat: Without an owner reference, cluster-scoped resources are not automatically deleted when the owner is removed. To ensure cleanup, either:

  • Register the resource with ResourceOptions{Delete: true} so it is removed during reconciliation when no longer needed.
  • Use a finalizer on the owner CRD to clean up cluster-scoped resources before the owner is deleted.

If the owner CRD is itself cluster-scoped, owner references are set normally on all resources regardless of their scope.

Status Model

The status values a component reports depend on which lifecycle interfaces its resources implement. The component aggregates across all registered resources and surfaces the most critical state.

Alive Resources (Alive interface)

Reported by long-running workloads (Deployments, StatefulSets, DaemonSets):

State Meaning
Healthy The resource has reached its desired state
Creating The resource is being provisioned for the first time
Updating The resource is being modified with new configuration
Scaling The resource is changing its replica count
Failing The resource is failing to converge to its desired state

Completable Resources (Completable interface)

Reported by run-to-completion resources (Jobs, tasks):

State Meaning
Completed The resource finished successfully
TaskRunning The resource is currently executing
TaskPending The resource is waiting to start
TaskFailing The resource finished with an error

Operational Resources (Operational interface)

Reported by integration resources whose readiness depends on external systems (Services, Ingresses, Gateways, CronJobs):

State Meaning
Operational The resource is fully operational
OperationPending The resource is waiting on an external dependency
OperationFailing The resource failed to reach an operational state

Static Resources (no interface)

Resources that implement none of the above interfaces are considered ready as long as they exist in the cluster. If a static resource has a guard, it can report Blocked when the guard precondition is not met.

Grace States

When a component has a grace period configured and a Graceful resource has not reached its target state within that period, the Graceful interface determines the post-expiry severity:

State Meaning
Healthy The resource is healthy (grace period expired without issue)
Degraded The resource is partially functional or convergence is taking longer than expected
Down The resource is completely non-functional

Suspension States

Reported during intentional deactivation:

State Meaning
PendingSuspension Suspension is acknowledged but has not started
Suspending Resources are actively being scaled down or cleaned up
Suspended All resources have reached their suspended state

Guard State

State Meaning
Blocked A resource's guard precondition is not met; it and subsequent resources wait

See Guards for details.

Prerequisite State

State Meaning
PrerequisiteNotMet A component-level prerequisite is not satisfied; no resources have been reconciled

See Prerequisites for details.

Feature Gate State

State Meaning
Disabled The component's feature gate is disabled; all resources deleted

See Component Feature Gates for details.

Condition Priority

When aggregating across multiple resources, the most critical state wins:

  1. Error / Down / Degraded: something is wrong
  2. Suspension states: the component is intentionally inactive
  3. Disabled: the component is intentionally removed by a feature gate
  4. Blocked / PrerequisiteNotMet: a precondition is not met
  5. Converging states (Creating, Updating, Scaling, TaskRunning, TaskPending, OperationPending): the component is progressing
  6. Healthy / Completed / Operational: all resources are in their target state

Grace Period

The grace period defines how long a component may remain in a converging state (Creating, Updating, Scaling) before transitioning to Degraded or Down.

component.NewComponentBuilder().
    WithGracePeriod(5 * time.Minute).
    // ...

During the grace period the component reports its real converging state, not a failure. After the period expires, if the component is still not Ready, the framework escalates to Degraded or Down based on resource health.

This prevents spurious failure alerts during normal operations like rolling updates.

Suspension Lifecycle

Suspension allows a component to be intentionally deactivated without deleting its configuration. When Suspend(true) is set on the builder:

  1. The component calls Suspend() on all Suspendable resources.
  2. Each resource performs its suspension behavior, typically scaling to zero replicas.
  3. The component polls SuspensionStatus() on each resource.
  4. Once all resources report Suspended, the condition transitions to Suspended.

Resources that do not yet exist in the cluster are created in their suspended state (with suspension mutations already applied). For example, a Deployment is created with zero replicas. This ensures the resource is immediately available when suspension ends.

Resources with DeleteOnSuspend enabled are not created if they are already absent. Their absence is treated as already suspended. This avoids a create→delete churn loop on every reconcile while the component remains suspended.

Resources that are not Suspendable are left in place.

ReconcileContext

ReconcileContext carries all dependencies for a reconciliation pass. Pass it from your controller on each call:

recCtx := component.ReconcileContext{
    Client:   r.Client,    // sigs.k8s.io/controller-runtime/pkg/client
    Scheme:   r.Scheme,    // *runtime.Scheme
    Recorder: r.Recorder,  // record.EventRecorder
    Metrics:  r.Metrics,   // component.Recorder (condition metrics)
    Owner:    owner,       // the CRD that owns this component
}

err = comp.Reconcile(ctx, recCtx)

Dependencies are passed explicitly so components remain testable and decoupled from global state.

The Metrics field is required. The framework records Prometheus metrics for every condition state transition during reconciliation. The recorder implementation is provided by go-crd-condition-metrics.

Guards

Guards allow resources within a component to express runtime dependencies on each other. A guard is a precondition function registered on a resource that is evaluated before the resource is applied. If the guard returns Blocked, the resource and all resources registered after it are skipped for that reconciliation cycle.

Combined with per-resource data extraction, guards enable indirect dependency graphs: Resource A is applied first, its data extractor runs and populates a shared variable, and Resource B's guard checks that variable before allowing B to proceed.

Registering a Guard

Guards are registered on the resource builder using WithGuard. The guard function receives a copy of the resource object and returns a GuardStatusWithReason.

The following example shows the complete pattern. A cloud provider role resource extracts its ARN after being applied. A bucket resource uses that ARN in its spec and guards against being applied before the ARN is available:

func (r *MyReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    // ...fetch owner...

    // roleARN is scoped to this reconcile call. The role resource's data extractor
    // populates it after the role is applied. Because extraction runs per-resource
    // (not after all resources), roleARN is set before the bucket's guard evaluates.
    var roleARN string

    comp, err := buildCloudComponent(owner, &roleARN)
    if err != nil {
        return ctrl.Result{}, err
    }
    return ctrl.Result{}, comp.Reconcile(ctx, recCtx)
}

func buildCloudComponent(owner *v1alpha1.MyApp, roleARN *string) (*component.Component, error) {
    // First resource: the cloud provider role.
    // After it is applied, the data extractor reads the ARN from the object.
    roleRes, err := static.NewBuilder(newCloudRole(owner)).
        WithDataExtractor(func(obj uns.Unstructured) error {
            *roleARN = obj.Object["status"].(map[string]any)["arn"].(string)
            return nil
        }).
        Build()
    if err != nil {
        return nil, err
    }

    // Second resource: the cloud provider bucket.
    // The role's data extractor populates *roleARN earlier in this same reconcile
    // cycle, which causes the guard to clear. The mutation then runs lazily at
    // Mutate() time and injects the now-populated *roleARN into the bucket spec.
    bucketRes, err := static.NewBuilder(newCloudBucket(owner)).
        WithGuard(func(_ uns.Unstructured) (concepts.GuardStatusWithReason, error) {
            if *roleARN == "" {
                return concepts.GuardStatusWithReason{
                    Status: concepts.GuardStatusBlocked,
                    Reason: "waiting for cloud provider role ARN",
                }, nil
            }
            return concepts.GuardStatusWithReason{
                Status: concepts.GuardStatusUnblocked,
            }, nil
        }).
        WithMutation(unstruct.Mutation{
            Name: "set-role-arn",
            Mutate: func(m *unstruct.Mutator) error {
                m.EditContent(func(e *editors.UnstructuredContentEditor) error {
                    return e.SetNestedString(*roleARN, "spec", "roleARN")
                })
                return nil
            },
        }).
        Build()
    if err != nil {
        return nil, err
    }

    // Registration order matters: the role must be registered before the bucket.
    return component.NewComponentBuilder().
        WithName("cloud-resources").
        WithConditionType("CloudResourcesReady").
        WithResource(roleRes, component.ResourceOptions{}).
        WithResource(bucketRes, component.ResourceOptions{}).
        Build()
}

The guard function receives the resource's object but is not required to use it. Guards that only check external state (closure variables populated by prior extractors) can ignore the parameter.

Guard Behavior

  • Guards are evaluated in resource registration order, before each resource is applied.
  • When a guard returns Blocked, the blocked resource contributes a Blocked status to the component condition regardless of the resource's participation mode. All resources after it are skipped entirely. This override exists because a blocked guard halts the entire pipeline, and subsequent required resources would otherwise be silently absent from health aggregation.
  • On the next reconciliation cycle, if the guard clears (returns Unblocked), the resource is applied normally.
  • Guards are not evaluated during suspension. The suspension path always proceeds regardless of guard state.
  • A guard evaluation error is treated as a reconciliation failure and sets the component condition to Error.

Status Reporting

A blocked guard produces a condition like:

type: WebInterfaceReady
status: "False"
reason: Blocked
message: "waiting for cloud provider role ARN"

The Blocked status is not sticky -- it is self-reinforcing because the guard re-evaluates on every reconcile. When the guard clears, the status immediately transitions to the next applicable state (e.g., Creating).

Best Practices

Keep controllers thin. The controller's job is to fetch the owner CRD, decide which components should exist, and call Reconcile on each. Resource-level logic belongs in the component and its primitives.

One component per user-visible feature. If you want a WebInterfaceReady and a DatabaseReady condition on your CRD, those are two separate components.

Group by lifecycle. Resources that must live and die together belong in the same component. If they have independent lifecycles, split them.

Use ParticipationModeAuxiliary for non-critical resources. A metrics exporter sidecar should not block your primary component from becoming Ready. All resource types default to ParticipationModeRequired, so set ParticipationModeAuxiliary explicitly when a resource's health should not gate the component condition.