Skip to content

feat: add basic api metric (detecting api client versions)#1102

Open
dacbd wants to merge 4 commits intomainfrom
dacbd/basic-api-metrics
Open

feat: add basic api metric (detecting api client versions)#1102
dacbd wants to merge 4 commits intomainfrom
dacbd/basic-api-metrics

Conversation

@dacbd
Copy link
Copy Markdown
Collaborator

@dacbd dacbd commented May 5, 2026

Summary by CodeRabbit

  • New Features
    • Added OpenTelemetry metrics support to the API for monitoring HTTP requests.
    • Metrics capture request method, status code, route, and client information.
    • Metrics exporting configurable via environment variables (OTLP endpoints).
    • Default service name set to "ctrlplane-api" with an override option.

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented May 5, 2026

CLA assistant check
All committers have signed the CLA.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 5, 2026

📝 Walkthrough

Walkthrough

Adds OpenTelemetry metrics: dependencies, env config, startup instrumentation, an Express middleware to count HTTP requests with labels, and wiring to export metrics periodically.

Changes

OpenTelemetry Metrics Instrumentation

Layer / File(s) Summary
Dependencies
apps/api/package.json
Added OpenTelemetry metric packages: @opentelemetry/api, @opentelemetry/exporter-metrics-otlp-http, @opentelemetry/resources, @opentelemetry/sdk-metrics.
Configuration
apps/api/src/config.ts
Added env vars: OTEL_EXPORTER_OTLP_ENDPOINT?, OTEL_EXPORTER_OTLP_METRICS_ENDPOINT?, and OTEL_SERVICE_NAME (default "ctrlplane-api").
Startup Import
apps/api/src/index.ts
Imports @/instrumentation.js at process startup (before env/app) to initialize metrics on startup.
Instrumentation Core
apps/api/src/instrumentation.ts
Resolves metricsUrl from env, derives /v1/metrics fallback, creates MeterProvider with service.name, installs PeriodicExportingMetricReader + OTLPMetricExporter (30s), registers global provider, and wires graceful shutdown.
Middleware
apps/api/src/middleware/metrics.ts
Adds metricsMiddleware that increments a lazily-created http.server.requests_by_client counter on res.finish with attributes: simplified client (from simplifyUserAgent), method, status_code, and route. Exports simplifyUserAgent and metricsMiddleware.
Server Wiring
apps/api/src/server.ts
Inserts metricsMiddleware into Express middleware chain after body parsers and before logger middleware.
Lint Config
packages/workspace-engine-sdk/eslint.config.js
Replaced empty export with configured array importing @ctrlplane/eslint-config/base and adding ignores for dist/**, node_modules/**, src/schema.ts.

Sequence Diagram

sequenceDiagram
    participant Proc as Process Start
    participant Instr as Instrumentation (apps/api/src/instrumentation.ts)
    participant MP as MeterProvider
    participant Exporter as OTLP Exporter
    participant App as Express App
    participant Middleware as metricsMiddleware
    participant Client as HTTP Client

    Proc->>Instr: import "@/instrumentation.js"
    Instr->>Instr: resolve metricsUrl from env
    Instr->>MP: create MeterProvider (resource: service.name)
    MP->>Exporter: register PeriodicExportingMetricReader (30s)
    Instr->>Instr: set global meter provider

    Client->>App: HTTP request
    App->>Middleware: pass request
    Middleware->>Middleware: simplifyUserAgent, resolve route
    Middleware->>MP: getCounter().add(1, {client,method,status_code,route})
    Exporter->>Exporter: batch & export metrics every 30s
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • jsbroks
  • adityachoudhari26

"I hopped through code with a twitchy ear,
Counting requests both far and near;
Metrics take flight, exports in sight,
A rabbit's cheer for observability delight! 🐇📈"

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main objective of the PR: adding basic API metrics with detection of client versions, which aligns with the middleware implementation that tracks HTTP requests by client.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dacbd/basic-api-metrics

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
apps/api/src/instrumentation.ts (1)

21-32: ⚡ Quick win

Add MeterProvider.shutdown() on process exit to flush the final metrics batch.

Without SIGTERM/SIGINT handlers, in-memory counter increments accumulated since the last 30-second export are silently dropped on every graceful shutdown (e.g., rolling deploy, pod eviction). The commented-out signal handler stubs in index.ts suggest this was planned. The official OTel JS documentation for the OTLP exporters shows exactly this pattern: ['SIGINT', 'SIGTERM'].forEach(signal => { process.on(signal, () => meterProvider.shutdown().catch(console.error)); })

♻️ Proposed fix
   metrics.setGlobalMeterProvider(meterProvider);
   logger.info(`OTel metrics enabled (endpoint: ${metricsUrl})`);
+
+  const shutdown = () =>
+    meterProvider.shutdown().catch((err) => logger.error("OTel metrics shutdown error", err));
+  process.on("SIGTERM", shutdown);
+  process.on("SIGINT", shutdown);
 } else {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/api/src/instrumentation.ts` around lines 21 - 32, When metricsUrl is set
and you create the MeterProvider, wire up process SIGINT/SIGTERM handlers to
call meterProvider.shutdown() so the PeriodicExportingMetricReader flushes
pending metrics on exit; add ['SIGINT','SIGTERM'].forEach(signal =>
process.on(signal, () => meterProvider.shutdown().catch(err =>
console.error('meterProvider.shutdown error', err)))) (ensure handlers reference
the created meterProvider instance and only register when meterProvider is
defined) so final batches from OTLPMetricExporter are exported before process
termination.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@apps/api/src/instrumentation.ts`:
- Around line 21-32: When metricsUrl is set and you create the MeterProvider,
wire up process SIGINT/SIGTERM handlers to call meterProvider.shutdown() so the
PeriodicExportingMetricReader flushes pending metrics on exit; add
['SIGINT','SIGTERM'].forEach(signal => process.on(signal, () =>
meterProvider.shutdown().catch(err => console.error('meterProvider.shutdown
error', err)))) (ensure handlers reference the created meterProvider instance
and only register when meterProvider is defined) so final batches from
OTLPMetricExporter are exported before process termination.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: efef17d7-c6d7-4154-87d4-126f0ea6e132

📥 Commits

Reviewing files that changed from the base of the PR and between 52046a1 and 42da1f6.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (6)
  • apps/api/package.json
  • apps/api/src/config.ts
  • apps/api/src/index.ts
  • apps/api/src/instrumentation.ts
  • apps/api/src/middleware/metrics.ts
  • apps/api/src/server.ts

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@apps/api/src/middleware/metrics.ts`:
- Around line 45-47: The metric label construction that returns
`${name}/${version}` for members of VERSIONED_CLIENT_ALLOWLIST must validate the
version string to prevent high-cardinality labels; update the code around the
label-building logic (the expression that uses
VERSIONED_CLIENT_ALLOWLIST.has(name) and `version`) to only include the version
when it matches a bounded pattern (e.g. semver-like regex or a strict
allowed-set) and otherwise fall back to a stable token such as just `name` or
`name/unknown`; ensure the validation is implemented as a small helper (e.g.
isValidClientVersion(version)) and used before interpolating the version into
the metric label.
- Around line 56-66: getCounter currently caches a noop Counter if called before
a real meter provider is registered, causing all subsequent increments to be
silently dropped; update getCounter (and the module-level counter) to detect
that the global provider is not configured (or that the cached counter is a
noop) and either assert/throw to fail-fast or clear and recreate the counter
once metrics.setGlobalMeterProvider() has been called — specifically modify
getCounter, counter, and the logic around
metrics.getMeter("ctrlplane-api").createCounter("http.server.requests_by_client")
so the function checks provider state (or noop-ness) before returning the cached
counter and resets/recreates it when a real provider becomes available.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: fb87988c-dbe9-4670-a42f-2a90409830b9

📥 Commits

Reviewing files that changed from the base of the PR and between 42da1f6 and 6f47658.

📒 Files selected for processing (3)
  • apps/api/src/instrumentation.ts
  • apps/api/src/middleware/metrics.ts
  • packages/workspace-engine-sdk/eslint.config.js
🚧 Files skipped from review as they are similar to previous changes (1)
  • apps/api/src/instrumentation.ts

Comment on lines +45 to +47
return VERSIONED_CLIENT_ALLOWLIST.has(name) && version
? `${name}/${version}`
: name;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Unvalidated version string will cause metric-label cardinality explosion.

The raw version substring is passed directly to the OTel counter label for allowlisted clients:

return VERSIONED_CLIENT_ALLOWLIST.has(name) && version
  ? `${name}/${version}`
  : name;

Any client sending User-Agent: ctrlc/<UUID>, ctrlc/<unix-timestamp>, or any high-cardinality string will generate a unique metric label combination per request. Most time-series backends (Prometheus, Thanos, Victoria Metrics) have hard cardinality limits; exceeding them causes OOM crashes or label truncation silently.

Validate version against a bounded pattern (e.g. semver-like) before using it as a label:

🛡️ Proposed fix
+ const VALID_VERSION_RE = /^\d{1,4}\.\d{1,4}(?:\.\d{1,4})?(?:[-+][a-zA-Z0-9._-]{0,32})?$/;

  ...

- return VERSIONED_CLIENT_ALLOWLIST.has(name) && version
-   ? `${name}/${version}`
-   : name;
+ if (VERSIONED_CLIENT_ALLOWLIST.has(name) && version && VALID_VERSION_RE.test(version))
+   return `${name}/${version}`;
+ return name;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/api/src/middleware/metrics.ts` around lines 45 - 47, The metric label
construction that returns `${name}/${version}` for members of
VERSIONED_CLIENT_ALLOWLIST must validate the version string to prevent
high-cardinality labels; update the code around the label-building logic (the
expression that uses VERSIONED_CLIENT_ALLOWLIST.has(name) and `version`) to only
include the version when it matches a bounded pattern (e.g. semver-like regex or
a strict allowed-set) and otherwise fall back to a stable token such as just
`name` or `name/unknown`; ensure the validation is implemented as a small helper
(e.g. isValidClientVersion(version)) and used before interpolating the version
into the metric label.

Comment thread apps/api/src/middleware/metrics.ts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants