feat: add basic api metric (detecting api client versions)#1102
feat: add basic api metric (detecting api client versions)#1102
Conversation
📝 WalkthroughWalkthroughAdds OpenTelemetry metrics: dependencies, env config, startup instrumentation, an Express middleware to count HTTP requests with labels, and wiring to export metrics periodically. ChangesOpenTelemetry Metrics Instrumentation
Sequence DiagramsequenceDiagram
participant Proc as Process Start
participant Instr as Instrumentation (apps/api/src/instrumentation.ts)
participant MP as MeterProvider
participant Exporter as OTLP Exporter
participant App as Express App
participant Middleware as metricsMiddleware
participant Client as HTTP Client
Proc->>Instr: import "@/instrumentation.js"
Instr->>Instr: resolve metricsUrl from env
Instr->>MP: create MeterProvider (resource: service.name)
MP->>Exporter: register PeriodicExportingMetricReader (30s)
Instr->>Instr: set global meter provider
Client->>App: HTTP request
App->>Middleware: pass request
Middleware->>Middleware: simplifyUserAgent, resolve route
Middleware->>MP: getCounter().add(1, {client,method,status_code,route})
Exporter->>Exporter: batch & export metrics every 30s
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
apps/api/src/instrumentation.ts (1)
21-32: ⚡ Quick winAdd
MeterProvider.shutdown()on process exit to flush the final metrics batch.Without SIGTERM/SIGINT handlers, in-memory counter increments accumulated since the last 30-second export are silently dropped on every graceful shutdown (e.g., rolling deploy, pod eviction). The commented-out signal handler stubs in
index.tssuggest this was planned. The official OTel JS documentation for the OTLP exporters shows exactly this pattern:['SIGINT', 'SIGTERM'].forEach(signal => { process.on(signal, () => meterProvider.shutdown().catch(console.error)); })♻️ Proposed fix
metrics.setGlobalMeterProvider(meterProvider); logger.info(`OTel metrics enabled (endpoint: ${metricsUrl})`); + + const shutdown = () => + meterProvider.shutdown().catch((err) => logger.error("OTel metrics shutdown error", err)); + process.on("SIGTERM", shutdown); + process.on("SIGINT", shutdown); } else {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/api/src/instrumentation.ts` around lines 21 - 32, When metricsUrl is set and you create the MeterProvider, wire up process SIGINT/SIGTERM handlers to call meterProvider.shutdown() so the PeriodicExportingMetricReader flushes pending metrics on exit; add ['SIGINT','SIGTERM'].forEach(signal => process.on(signal, () => meterProvider.shutdown().catch(err => console.error('meterProvider.shutdown error', err)))) (ensure handlers reference the created meterProvider instance and only register when meterProvider is defined) so final batches from OTLPMetricExporter are exported before process termination.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@apps/api/src/instrumentation.ts`:
- Around line 21-32: When metricsUrl is set and you create the MeterProvider,
wire up process SIGINT/SIGTERM handlers to call meterProvider.shutdown() so the
PeriodicExportingMetricReader flushes pending metrics on exit; add
['SIGINT','SIGTERM'].forEach(signal => process.on(signal, () =>
meterProvider.shutdown().catch(err => console.error('meterProvider.shutdown
error', err)))) (ensure handlers reference the created meterProvider instance
and only register when meterProvider is defined) so final batches from
OTLPMetricExporter are exported before process termination.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: efef17d7-c6d7-4154-87d4-126f0ea6e132
⛔ Files ignored due to path filters (1)
pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (6)
apps/api/package.jsonapps/api/src/config.tsapps/api/src/index.tsapps/api/src/instrumentation.tsapps/api/src/middleware/metrics.tsapps/api/src/server.ts
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@apps/api/src/middleware/metrics.ts`:
- Around line 45-47: The metric label construction that returns
`${name}/${version}` for members of VERSIONED_CLIENT_ALLOWLIST must validate the
version string to prevent high-cardinality labels; update the code around the
label-building logic (the expression that uses
VERSIONED_CLIENT_ALLOWLIST.has(name) and `version`) to only include the version
when it matches a bounded pattern (e.g. semver-like regex or a strict
allowed-set) and otherwise fall back to a stable token such as just `name` or
`name/unknown`; ensure the validation is implemented as a small helper (e.g.
isValidClientVersion(version)) and used before interpolating the version into
the metric label.
- Around line 56-66: getCounter currently caches a noop Counter if called before
a real meter provider is registered, causing all subsequent increments to be
silently dropped; update getCounter (and the module-level counter) to detect
that the global provider is not configured (or that the cached counter is a
noop) and either assert/throw to fail-fast or clear and recreate the counter
once metrics.setGlobalMeterProvider() has been called — specifically modify
getCounter, counter, and the logic around
metrics.getMeter("ctrlplane-api").createCounter("http.server.requests_by_client")
so the function checks provider state (or noop-ness) before returning the cached
counter and resets/recreates it when a real provider becomes available.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: fb87988c-dbe9-4670-a42f-2a90409830b9
📒 Files selected for processing (3)
apps/api/src/instrumentation.tsapps/api/src/middleware/metrics.tspackages/workspace-engine-sdk/eslint.config.js
🚧 Files skipped from review as they are similar to previous changes (1)
- apps/api/src/instrumentation.ts
| return VERSIONED_CLIENT_ALLOWLIST.has(name) && version | ||
| ? `${name}/${version}` | ||
| : name; |
There was a problem hiding this comment.
Unvalidated version string will cause metric-label cardinality explosion.
The raw version substring is passed directly to the OTel counter label for allowlisted clients:
return VERSIONED_CLIENT_ALLOWLIST.has(name) && version
? `${name}/${version}`
: name;Any client sending User-Agent: ctrlc/<UUID>, ctrlc/<unix-timestamp>, or any high-cardinality string will generate a unique metric label combination per request. Most time-series backends (Prometheus, Thanos, Victoria Metrics) have hard cardinality limits; exceeding them causes OOM crashes or label truncation silently.
Validate version against a bounded pattern (e.g. semver-like) before using it as a label:
🛡️ Proposed fix
+ const VALID_VERSION_RE = /^\d{1,4}\.\d{1,4}(?:\.\d{1,4})?(?:[-+][a-zA-Z0-9._-]{0,32})?$/;
...
- return VERSIONED_CLIENT_ALLOWLIST.has(name) && version
- ? `${name}/${version}`
- : name;
+ if (VERSIONED_CLIENT_ALLOWLIST.has(name) && version && VALID_VERSION_RE.test(version))
+ return `${name}/${version}`;
+ return name;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@apps/api/src/middleware/metrics.ts` around lines 45 - 47, The metric label
construction that returns `${name}/${version}` for members of
VERSIONED_CLIENT_ALLOWLIST must validate the version string to prevent
high-cardinality labels; update the code around the label-building logic (the
expression that uses VERSIONED_CLIENT_ALLOWLIST.has(name) and `version`) to only
include the version when it matches a bounded pattern (e.g. semver-like regex or
a strict allowed-set) and otherwise fall back to a stable token such as just
`name` or `name/unknown`; ensure the validation is implemented as a small helper
(e.g. isValidClientVersion(version)) and used before interpolating the version
into the metric label.
Summary by CodeRabbit