Skip to content

Latest commit

 

History

History
276 lines (194 loc) · 7.27 KB

File metadata and controls

276 lines (194 loc) · 7.27 KB

OTLP Metrics Integration

Cosmian KMS exports metrics via OpenTelemetry Protocol (OTLP) over gRPC. This allows you to send metrics directly to any OTLP-compatible backend without exposing an HTTP endpoint.

Architecture

┌─────────────┐                    ┌──────────────────┐
│  KMS Server │ ──OTLP/gRPC──────> │ OTLP Collector   │
│             │  (port 4317)       │                  │
└─────────────┘                    └──────────────────┘
                                           │
                        ┌──────────────────┼──────────────────┐
                        ▼                  ▼                  ▼
                  ┌─────────┐        ┌─────────┐      ┌──────────┐
                  │ Jaeger  │        │ Cloud   │      │ Custom   │
                  │         │        │ Provider│      │ Backend  │
                  └─────────┘        └─────────┘      └──────────┘

Configuration

Enable OTLP Metrics in KMS

Configure the OTLP endpoint in your kms.toml:

[logging]
# OTLP endpoint for metrics export
otlp = "http://localhost:4317"

Or via environment variable:

export KMS_OTLP_URL="http://localhost:4317"
export KMS_ENABLE_METERING="true"

Or via command-line flag:

cosmian_kms --otlp http://localhost:4317 --enable-metering

Metrics Export Behavior

  • Automatic: Metrics are automatically sent when otlp URL is configured
  • Interval: Metrics are pushed every 30 seconds
  • Protocol: gRPC transport (OTLP/gRPC)
  • No HTTP endpoint: KMS does not expose any HTTP /metrics endpoint

Quick Start with Docker

1. Start the OTLP Stack

# Start OTLP Collector and Jaeger
docker compose -f docker-compose.otel.yml up -d

This starts:

2. Start KMS with OTLP

# Configure KMS to send metrics to OTLP Collector
cosmian_kms --otlp-url http://localhost:4317 \
            --database-type sqlite \
            --sqlite-path /tmp/kms-data

3. View Metrics

Available Metrics

The server exposes the following instruments via OTLP, as implemented in crate/server/src/core/otel_metrics.rs.

KMIP Operations

  • kms.kmip.operations.total — Total KMIP operations executed (counter)
  • kms.kmip.operations.per_user.total — Total KMIP operations per user (counter)
  • kms.kmip.operation.duration — Duration of KMIP operations in seconds (histogram)

Users & Permissions

  • kms.active.users — Number of unique active users (up-down counter)
  • kms.permissions.granted.per_user.total — Permissions granted per user (counter)
  • kms.permissions.granted.total — Total permissions granted (counter)

Database Metrics

  • kms.database.operations.total — Total database operations (counter)
  • kms.database.operation.duration — Database operation duration in seconds (histogram)

HTTP Metrics

  • kms.http.requests.total — Total HTTP requests (counter)
  • kms.http.request.duration — HTTP request duration in seconds (histogram)

Server Health

  • kms.server.uptime — Server uptime in seconds (counter)
  • kms.server.start_time — Server start time as Unix timestamp (up-down counter)
  • kms.active.connections — Current number of active connections (up-down counter)
  • kms.errors.total — Total number of errors by type (counter)

Objects & Keys

  • kms.objects.total — Total number of objects (up-down counter)
  • kms.keys.active.count — Number of keys in Active state (up-down counter; absolute count applied via delta)

Cache

  • kms.cache.operations.total — Total cache operations (counter)

HSM

  • kms.hsm.operations.total — Total HSM operations (counter)

OTLP Collector Configuration

The otel-collector-config.yaml receives metrics from KMS and forwards to backends:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317

exporters:
  otlp:
    endpoint: ${env:OTLP_ENDPOINT}  # Forward to Jaeger, etc.

service:
  pipelines:
    metrics:
      receivers: [otlp]
      processors: [resource, batch]
      exporters: [otlp, debug]

Cloud Provider Integration

Send to Datadog

# Configure OTLP Collector to export to Datadog
export DD_SITE="datadoghq.com"
export DD_API_KEY="your-api-key"

# Update otel-collector-config.yaml
exporters:
  datadog:
    api:
      key: ${env:DD_API_KEY}
      site: ${env:DD_SITE}

Send to New Relic

export NEW_RELIC_LICENSE_KEY="your-license-key"

# Update otel-collector-config.yaml
exporters:
  otlp:
    endpoint: otlp.nr-data.net:4317
    headers:
      api-key: ${env:NEW_RELIC_LICENSE_KEY}

Send to Grafana Cloud

export GRAFANA_INSTANCE_ID="your-instance-id"
export GRAFANA_API_KEY="your-api-key"

# Update otel-collector-config.yaml
exporters:
  otlp:
    endpoint: otlp-gateway-${GRAFANA_INSTANCE_ID}.grafana.net:4317
    headers:
      authorization: "Bearer ${GRAFANA_API_KEY}"

Production Deployment

Security Best Practices

  1. Use TLS for OTLP transport:
[logging]
otlp = "https://collector.example.com:4317"
  1. Authentication: Configure API keys in OTLP Collector:
exporters:
  otlp:
    headers:
      authorization: "Bearer ${API_TOKEN}"
  1. Network isolation: Run OTLP Collector in private network

High Availability

Deploy multiple OTLP Collectors with load balancing:

[logging]
otlp = "https://otlp-lb.example.com:4317"

Troubleshooting

No metrics appearing

  1. Check KMS logs for OTLP connection errors:

    cosmian_kms --log-level debug
  2. Check Collector logs:

docker compose -f docker-compose.otel.yml logs -f otel-collector

Metrics export errors

  • Ensure otlp URL is correct in configuration
  • Check network connectivity to OTLP Collector
  • Verify Collector has correct exporters configured

Files Reference

File Purpose
otel-collector-config.yaml OTLP Collector configuration
docker-compose.otel.yml Local development stack
crate/server/src/core/otel_metrics.rs Metrics instruments and recording helpers

Differences from HTTP /metrics Endpoint

Previous Architecture (Removed):

  • KMS exposed HTTP /metrics endpoint
  • External scrapers pulled metrics
  • Security concerns with exposed endpoint

Current Architecture:

  • KMS pushes metrics via OTLP
  • No HTTP endpoint exposure
  • More secure and flexible
  • Cloud-native standard

Additional Resources