Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions KubeOps.slnx
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
<Project Path="examples\AspireOperator\AspireOperator.csproj" />
<Project Path="examples\ConversionWebhookOperator\ConversionWebhookOperator.csproj" />
<Project Path="examples\Operator\Operator.csproj" />
<Project Path="examples\OtelOperator\OtelOperator.csproj" />
<Project Path="examples\WebhookOperator\WebhookOperator.csproj" />
</Folder>
<Folder Name="/solution/">
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ The documentation is also provided within the code itself (description of method
- **Code Generation:** Includes Roslyn source generators and a CLI tool (`kubeops`) to automate boilerplate code for CRDs, controllers, and RBAC rules.
- **Enhanced Kubernetes Client:** Provides convenience methods built on top of the official client library.
- **Leader Election:** Automatic handling for high-availability operator deployments.
- **Metrics:** Built-in [OpenTelemetry](https://opentelemetry.io/) metrics for the reconciliation queue, reconciler, and watchers, exportable via any OpenTelemetry exporter.

## Getting Started

Expand Down
5 changes: 5 additions & 0 deletions docs/docs/operator/logging.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,11 @@ To enable scopes with OpenTelemetry, configure it as follows:
The scope state must be an `IReadOnlyDictionary<string, object?>` to ensure correct serialization and inclusion in log entries.
:::

:::tip
Besides tracing, the operator also emits OpenTelemetry **metrics** for its reconciliation pipeline
(queue depth, reconciliation count/duration, watch events, and more). See [Metrics](./metrics).
:::

## Tracing with `System.Diagnostics` and `ActivitySource`

For [distributed tracing](https://learn.microsoft.com/en-us/dotnet/core/diagnostics/distributed-tracing-concepts), this project uses `System.Diagnostics` in combination with `ActivitySource`.
Expand Down
147 changes: 147 additions & 0 deletions docs/docs/operator/metrics.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
---
title: Metrics
description: OpenTelemetry metrics for the operator pipeline
sidebar_position: 6.5
---

# Metrics

KubeOps emits [OpenTelemetry](https://opentelemetry.io/) metrics for its reconciliation pipeline
through a [`Meter`](https://learn.microsoft.com/dotnet/core/diagnostics/metrics) named after the
operator (`OperatorSettings.Name`) — the same identifier used for the tracing `ActivitySource`.

Collecting metrics is enabled by default and is virtually free when no exporter is attached. To
actually scrape the data you register an OpenTelemetry exporter for the meter.

## Enabling / disabling

Metrics collection is controlled by `OperatorSettings.EnableMetrics` (default `true`). Disable it via
the fluent builder:

```csharp
builder.Services.AddKubernetesOperator(settings => settings
.WithMetrics(false));
```

When disabled, the metrics infrastructure is not registered and the instrumentation in the watcher,
queue, and reconciler is skipped entirely.

## Instruments

All instruments carry a `kubeops.entity.type` tag (the watched entity's type name, e.g. `V1MyResource`).

| Name | Type | Unit | Additional tags |
|------|------|------|-----------------|
| `kubeops.operator.queue.depth` | ObservableGauge | `{items}` | `kubeops.queue.state` (`scheduled` \| `ready`) |
| `kubeops.operator.queue.enqueued` | Counter | `{items}` | `kubeops.trigger.source` (`api_server` \| `operator`) |
| `kubeops.operator.queue.requeued` | Counter | `{items}` | `kubeops.requeue.reason` (`conflict` \| `error_retry` \| `operator_requeue`) |
| `kubeops.operator.queue.discarded` | Counter | `{items}` | — |
| `kubeops.operator.reconciliation` | Counter | `{reconciliations}` | `kubeops.reconciliation.type` (`added` \| `modified` \| `deleted`), `kubeops.reconciliation.status` (`success` \| `failure`), `error.type` (on failure) |
| `kubeops.operator.reconciliation.duration` | Histogram | `s` | `kubeops.reconciliation.type`, `kubeops.reconciliation.status`, `error.type` (on failure) |
| `kubeops.operator.watcher.events` | Counter | `{events}` | `kubeops.watcher.event.type` (`added` \| `modified` \| `deleted` \| `bookmark`) |
| `kubeops.operator.watcher.reconnections` | Counter | `{reconnections}` | — |

The `kubeops.operator.queue.depth` gauge reports two series: `scheduled` (entries waiting for a delayed
requeue) and `ready` (entries waiting to be picked up by the reconciliation loop).

:::note
`kubeops.operator.queue.requeued` is a **subset** of `kubeops.operator.queue.enqueued`: every requeue (conflict,
error-retry, or operator requeue) also increments the enqueued counter. Do not add the two together
when building dashboards — use `requeued` for the per-reason breakdown of requeues only.

The `kubeops.trigger.source` tag on `kubeops.operator.queue.enqueued` reflects the *original* event source. An
error-retry therefore keeps its original source (e.g. `api_server`) rather than `operator`; use
`kubeops.operator.queue.requeued{kubeops.requeue.reason="error_retry"}` to count retries explicitly.
:::

:::note
The queue runs side-by-side with the watcher rather than strictly in front of the reconciler, so the
queue instruments give a good — but not exhaustive — view of throughput. See
[issue #1037](https://github.com/dotnet/dotnet-operator-sdk/issues/1037) for context.
:::

The `error.type` attribute is only present on **failed** reconciliations and carries the failing
exception's full type name (or `_OTHER` when a reconciliation reports failure without an exception).
It follows the OpenTelemetry `error.type` convention and is bounded by the set of exception types your
controllers throw.

The `kubeops.operator.reconciliation.duration` histogram uses second-scale bucket boundaries
(`5ms … 60s`) tuned for typical reconcile latencies, so `histogram_quantile()` over
`kubeops_operator_reconciliation_duration_seconds_bucket` yields meaningful percentiles out of the box.

### Prometheus exposition names

The instrument names above are the OpenTelemetry names. The Prometheus exporter translates them
(dots → underscores, `_total` suffix for counters, unit suffix for the histogram, UCUM annotation
units such as `{items}` dropped). The scrape endpoint therefore exposes:

| OpenTelemetry instrument | Prometheus time series |
|---|---|
| `kubeops.operator.queue.depth` | `kubeops_operator_queue_depth` |
| `kubeops.operator.queue.enqueued` | `kubeops_operator_queue_enqueued_total` |
| `kubeops.operator.queue.requeued` | `kubeops_operator_queue_requeued_total` |
| `kubeops.operator.queue.discarded` | `kubeops_operator_queue_discarded_total` |
| `kubeops.operator.reconciliation` | `kubeops_operator_reconciliation_total` |
| `kubeops.operator.reconciliation.duration` | `kubeops_operator_reconciliation_duration_seconds` (`_bucket` / `_sum` / `_count`) |
| `kubeops.operator.watcher.events` | `kubeops_operator_watcher_events_total` |
| `kubeops.operator.watcher.reconnections` | `kubeops_operator_watcher_reconnections_total` |

## Exposing a Prometheus endpoint (KubeOps.Operator.Web)

Metrics export is configured through the standard OpenTelemetry pipeline, separate from the operator
registration chain. `KubeOps.Operator.Web` provides two helpers: `AddKubeOpsInstrumentation()` on the
`MeterProviderBuilder` subscribes to the operator's meter (the operator name is resolved from the
registered `OperatorSettings`, so you don't have to repeat it), and `MapOperatorMetricsEndpoint()`
exposes the Prometheus scraping endpoint:

```csharp
var builder = WebApplication.CreateBuilder(args);

builder.Services
.AddKubernetesOperator()
.RegisterComponents();

// NuGet: OpenTelemetry.Extensions.Hosting
builder.Services
.AddOpenTelemetry()
.WithMetrics(m => m
.AddKubeOpsInstrumentation() // subscribes to the operator meter
.AddPrometheusExporter());

var app = builder.Build();
app.UseRouting();
app.MapControllers();
app.MapOperatorMetricsEndpoint(); // exposes GET /metrics

app.Run();
```

Pass the name explicitly with `AddKubeOpsInstrumentation(operatorName)` if `AddKubernetesOperator()`
has not run yet on the same service collection.

## Manual exporter configuration

Without `KubeOps.Operator.Web` you can register any OpenTelemetry exporter yourself. Add the meter by
the operator name (`== OperatorSettings.Name`) and pick an exporter:

```csharp
// Standalone HttpListener (no ASP.NET Core)
// NuGet: OpenTelemetry.Exporter.Prometheus.HttpListener
.WithMetrics(m => m
.AddMeter(operatorName)
.AddPrometheusHttpListener(o => o.UriPrefixes = ["http://+:9464/"]));
// 9464 is the Prometheus convention for the metrics scrape port.
```

```csharp
// OTLP to an OpenTelemetry Collector
// NuGet: OpenTelemetry.Exporter.OpenTelemetryProtocol
.WithMetrics(m => m
.AddMeter(operatorName)
.AddOtlpExporter());
```

:::tip
If you already use [.NET Aspire](./aspire) via `KubeOps.Aspire`, the meter is picked up automatically
by `AddKubeOpsServiceDefaults`, which configures OpenTelemetry with OTLP export.
:::
32 changes: 32 additions & 0 deletions examples/OtelOperator/Controller/V1OtelDemoEntityController.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the Apache 2.0 License.
// See the LICENSE file in the project root for more information.

using k8s.Models;

using KubeOps.Abstractions.Rbac;
using KubeOps.Abstractions.Reconciliation;
using KubeOps.Abstractions.Reconciliation.Controller;

using OtelOperator.Entities;

namespace OtelOperator.Controller;

[EntityRbac(typeof(V1OtelDemoEntity), Verbs = RbacVerb.All)]
public sealed class V1OtelDemoEntityController(ILogger<V1OtelDemoEntityController> logger)
: IEntityController<V1OtelDemoEntity>
{
public Task<ReconciliationResult<V1OtelDemoEntity>> ReconcileAsync(
V1OtelDemoEntity entity, CancellationToken cancellationToken)
{
logger.LogInformation("Reconciling entity {Namespace}/{Name}.", entity.Namespace(), entity.Name());
return Task.FromResult(ReconciliationResult<V1OtelDemoEntity>.Success(entity));
}

public Task<ReconciliationResult<V1OtelDemoEntity>> DeletedAsync(
V1OtelDemoEntity entity, CancellationToken cancellationToken)
{
logger.LogInformation("Deleted entity {Namespace}/{Name}.", entity.Namespace(), entity.Name());
return Task.FromResult(ReconciliationResult<V1OtelDemoEntity>.Success(entity));
}
}
18 changes: 18 additions & 0 deletions examples/OtelOperator/Entities/V1OtelDemoEntity.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the Apache 2.0 License.
// See the LICENSE file in the project root for more information.

using k8s.Models;

using KubeOps.Abstractions.Entities;

namespace OtelOperator.Entities;

[KubernetesEntity(Group = "testing.dev", ApiVersion = "v1", Kind = "OtelDemoEntity")]
public sealed partial class V1OtelDemoEntity : CustomKubernetesEntity<V1OtelDemoEntity.EntitySpec>
{
public sealed class EntitySpec
{
public string Message { get; set; } = string.Empty;
}
}
22 changes: 22 additions & 0 deletions examples/OtelOperator/OtelOperator.csproj
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
<Project Sdk="Microsoft.NET.Sdk.Web">

<PropertyGroup>
<TargetFramework>net9.0</TargetFramework>
<Nullable>enable</Nullable>
<ImplicitUsings>enable</ImplicitUsings>
<IsPackable>false</IsPackable>
</PropertyGroup>

<ItemGroup>
<ProjectReference Include="..\..\src\KubeOps.Generator\KubeOps.Generator.csproj"
OutputItemType="Analyzer"
ReferenceOutputAssembly="false" />
<ProjectReference Include="..\..\src\KubeOps.Operator.Web\KubeOps.Operator.Web.csproj" />
</ItemGroup>

<ItemGroup>
<PackageReference Include="OpenTelemetry.Extensions.Hosting" Version="1.16.0" />
<PackageReference Include="OpenTelemetry.Exporter.OpenTelemetryProtocol" Version="1.16.0" />
</ItemGroup>

</Project>
48 changes: 48 additions & 0 deletions examples/OtelOperator/Program.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the Apache 2.0 License.
// See the LICENSE file in the project root for more information.

using KubeOps.Abstractions.Builder;
using KubeOps.Operator;
using KubeOps.Operator.Web.Builder;

using OpenTelemetry;
using OpenTelemetry.Metrics;
using OpenTelemetry.Trace;

const string operatorName = "otel-operator";

var builder = WebApplication.CreateBuilder(args);

// Operator registration only contains operator building blocks.
builder.Services
.AddKubernetesOperator(settings => settings.WithName(operatorName))
.RegisterComponents();

// Observability is wired up separately through the standard OpenTelemetry pipeline.
// AddKubeOpsInstrumentation() subscribes to the operator's Meter and AddSource() to its
// ActivitySource (both named after OperatorSettings.Name). UseOtlpExporter() then exports
// every signal (metrics and traces) to an OTLP collector in a single, global call -
// configure the endpoint via the OTEL_EXPORTER_OTLP_ENDPOINT environment variable.
builder.Services
.AddOpenTelemetry()
.WithMetrics(metrics => metrics
.AddKubeOpsInstrumentation())
.WithTracing(tracing => tracing
.AddSource(operatorName))
.UseOtlpExporter();

var app = builder.Build();

app.UseRouting();

// Alternative to OTLP: expose a Prometheus scraping endpoint instead. Swap the metrics
// exporter for the Prometheus one and map the endpoint:
//
// .WithMetrics(metrics => metrics
// .AddKubeOpsInstrumentation()
// .AddPrometheusExporter())
//
// app.MapOperatorMetricsEndpoint(); // exposes GET /metrics

await app.RunAsync();
7 changes: 7 additions & 0 deletions renovate.json
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,13 @@
"matchPackageNames": ["Aspire.Hosting.Kubernetes"],
"ignoreUnstable": false,
"respectLatest": false
},
{
"description": ["OpenTelemetry.Exporter.Prometheus.AspNetCore only ships prerelease versions, so allow unstable updates for it."],
"matchManagers": ["nuget"],
"matchPackageNames": ["OpenTelemetry.Exporter.Prometheus.AspNetCore"],
"ignoreUnstable": false,
"respectLatest": false
}
]
}
11 changes: 11 additions & 0 deletions src/KubeOps.Abstractions/Builder/OperatorSettings.cs
Original file line number Diff line number Diff line change
Expand Up @@ -117,4 +117,15 @@ public sealed record OperatorSettings
/// <seealso cref="ParallelReconciliationSettings"/>
/// <seealso cref="ParallelReconciliationConflictStrategy"/>
public required ParallelReconciliationSettings ParallelReconciliation { get; init; }

/// <summary>
/// Indicates whether the operator collects OpenTelemetry metrics (queue, watcher, and
/// reconciliation instruments) via a <see cref="System.Diagnostics.Metrics.Meter"/> named
/// after <see cref="Name"/>.
/// </summary>
/// <remarks>
/// Collecting metrics is virtually free when no listener/exporter is attached. To actually
/// scrape the metrics, register an OpenTelemetry exporter for the meter named <see cref="Name"/>.
/// </remarks>
public required bool EnableMetrics { get; init; }
}
8 changes: 8 additions & 0 deletions src/KubeOps.Abstractions/Builder/OperatorSettingsBuilder.cs
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,13 @@ public sealed partial class OperatorSettingsBuilder
/// <seealso cref="ParallelReconciliationConflictStrategy"/>
public ParallelReconciliationSettingsBuilder ParallelReconciliation { get; set; } = new();

/// <summary>
/// Indicates whether the operator collects OpenTelemetry metrics. Defaults to <c>true</c>.
/// Collecting is virtually free without an attached listener/exporter; set to <c>false</c> to
/// skip registering the metrics infrastructure entirely.
/// </summary>
public bool EnableMetrics { get; set; } = true;

/// <summary>
/// Produces an immutable <see cref="OperatorSettings"/> record from the current configuration.
/// </summary>
Expand All @@ -124,6 +131,7 @@ public sealed partial class OperatorSettingsBuilder
AutoDetachFinalizers = AutoDetachFinalizers,
ReconcileStrategy = ReconcileStrategy,
ParallelReconciliation = ParallelReconciliation.Build(),
EnableMetrics = EnableMetrics,
};

[GeneratedRegex(@"(\W|_)", RegexOptions.CultureInvariant)]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,17 @@ public static OperatorSettingsBuilder WithReconcileStrategy(
return builder;
}

/// <summary>Sets whether the operator collects OpenTelemetry metrics.</summary>
/// <param name="builder">The builder to configure.</param>
/// <param name="value"><c>true</c> to collect metrics (default); <c>false</c> to disable.</param>
/// <returns>The same <paramref name="builder"/> instance for chaining.</returns>
public static OperatorSettingsBuilder WithMetrics(
this OperatorSettingsBuilder builder, bool value = true)
{
builder.EnableMetrics = value;
return builder;
}

/// <summary>Configures parallel reconciliation settings inline via a delegate.</summary>
/// <param name="builder">The builder to configure.</param>
/// <param name="configure">An action that configures the <see cref="ParallelReconciliationSettingsBuilder"/>.</param>
Expand Down
7 changes: 3 additions & 4 deletions src/KubeOps.Aspire.Hosting/KubeOps.Aspire.Hosting.csproj
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,6 @@

<PropertyGroup>
<TargetFrameworks>net8.0;net9.0;net10.0</TargetFrameworks>
<!-- Aspire.Hosting.Kubernetes is only published as a prerelease package.
Allow this stable package to depend on it (NU5104). -->
<NoWarn>$(NoWarn);NU5104</NoWarn>
</PropertyGroup>

<PropertyGroup>
Expand All @@ -20,7 +17,9 @@

<ItemGroup>
<PackageReference Include="Aspire.Hosting" Version="13.4.3" />
<PackageReference Include="Aspire.Hosting.Kubernetes" Version="13.4.3-preview.1.26305.13" />
<!-- Aspire.Hosting.Kubernetes is only published as a prerelease package.
Allow this stable package to depend on it (NU5104). -->
<PackageReference Include="Aspire.Hosting.Kubernetes" Version="13.4.3-preview.1.26305.13" NoWarn="NU5104" />
<PackageReference Include="YamlDotNet" Version="18.0.0" />
<!-- Pin transitive MessagePack (via Aspire.Hosting -> StreamJsonRpc) to a version
without GHSA-hv8m-jj95-wg3x. Remove once Aspire ships a fixed StreamJsonRpc. -->
Expand Down
Loading
Loading