Skip to content

feat: add built-in Prometheus metrics for HTTP and LLM observability#3439

Open
RichardoMrMu wants to merge 3 commits intoQuantumNous:mainfrom
RichardoMrMu:feat/prometheus-metrics
Open

feat: add built-in Prometheus metrics for HTTP and LLM observability#3439
RichardoMrMu wants to merge 3 commits intoQuantumNous:mainfrom
RichardoMrMu:feat/prometheus-metrics

Conversation

@RichardoMrMu
Copy link

@RichardoMrMu RichardoMrMu commented Mar 25, 2026

Summary

Add native Prometheus metrics support to New API, covering both HTTP gateway metrics and LLM-specific metrics. This enables production-grade observability without external instrumentation.

Closes #2402

Motivation

New API currently lacks built-in observability. Operators must rely on external reverse proxies or custom scripts to monitor request rates, latencies, and token usage. This PR adds first-class Prometheus metrics that can be scraped by any Prometheus-compatible system (Prometheus, Grafana Agent, VictoriaMetrics, etc.).

Changes

New Files (already in the repo)

  • middleware/metrics.go — GIN middleware that collects HTTP golden signals
  • metrics/ai_metrics.go — LLM metrics recorder with token counts, latency, and TTFT

Modified Files (this PR)

  • main.go — Register PrometheusMiddleware() in the GIN middleware chain
  • router/main.go — Add GET /metrics endpoint for Prometheus scraping
  • service/text_quota.go — Call metrics.RecordAIMetrics() after text quota consumption
  • service/quota.go — Call metrics.RecordAIMetrics() after WSS quota consumption

Metrics Exposed

HTTP Metrics (4)

Metric Type Labels Description
newapi_http_requests_total Counter method, path, status Total HTTP requests
newapi_http_request_duration_seconds Histogram method, path, status Request latency
newapi_http_requests_in_flight Gauge Concurrent requests
newapi_http_response_size_bytes Histogram method, path, status Response size

LLM Metrics (5)

Metric Type Labels Description
newapi_llm_input_token_total Counter model, channel, upstream_model Input tokens
newapi_llm_output_token_total Counter model, channel, upstream_model Output tokens
newapi_llm_request_total Counter model, channel, upstream_model LLM requests
newapi_llm_service_duration_seconds Histogram model, channel, upstream_model E2E latency
newapi_llm_first_token_duration_seconds Histogram model, channel, upstream_model TTFT

Design Decisions

  • Zero new dependencies: prometheus/client_golang is already an indirect dependency
  • Minimal code change: Only 12 lines added across 4 existing files
  • Route normalization: Uses GIN FullPath() to avoid high-cardinality label explosion
  • Non-intrusive: Metrics collection is fire-and-forget, no impact on request latency
  • Opt-in scraping: Metrics are only exposed when /metrics is scraped

Usage

# prometheus.yml
scrape_configs:
  - job_name: new-api
    static_configs:
      - targets: ["localhost:3000"]

Testing

  • Verified /metrics endpoint returns valid Prometheus exposition format
  • Confirmed HTTP metrics update on every request
  • Confirmed LLM metrics update after chat completion requests

Summary by CodeRabbit

  • New Features
    • Added a /metrics endpoint for Prometheus scraping.
    • Integrated Prometheus middleware to collect request metrics (counts, durations, in-flight, response size) and serve metrics.
    • Added AI usage metrics: token counts (input/output/total), time-to-first-token, time-per-output-token, upstream durations, plus helpers for rate-limit, circuit breaker, and gateway duration tracking.

- Register PrometheusMiddleware in main.go for HTTP golden metrics
- Add /metrics endpoint in router for Prometheus scraping
- Call RecordAIMetrics in PostTextConsumeQuota (text_quota.go) and
  PostWssConsumeQuota (quota.go) for LLM metrics collection

HTTP metrics (middleware/metrics.go - already present):
  - newapi_http_requests_total (Counter)
  - newapi_http_request_duration_seconds (Histogram)
  - newapi_http_requests_in_flight (Gauge)
  - newapi_http_response_size_bytes (Histogram)

LLM metrics (metrics/ai_metrics.go - already present):
  - newapi_llm_input_token_total (Counter)
  - newapi_llm_output_token_total (Counter)
  - newapi_llm_request_total (Counter)
  - newapi_llm_service_duration_seconds (Histogram)
  - newapi_llm_first_token_duration_seconds (Histogram/TTFT)

Closes QuantumNous#2402
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 25, 2026

Walkthrough

Adds Prometheus monitoring: a Prometheus middleware is injected into the Gin middleware chain, a /metrics HTTP route is registered, and service code records AI usage and related metrics during quota and task consumption operations.

Changes

Cohort / File(s) Summary
Entrypoint & Router
main.go, router/main.go
Inserted middleware.PrometheusMiddleware() into Gin middleware chain and registered GET /metrics handled by middleware.MetricsHandler().
HTTP Metrics Middleware
middleware/metrics.go
New middleware that collects Prometheus HTTP metrics (requests, durations, in-flight, response size) and exposes MetricsHandler() to serve /metrics. Skips collecting for the /metrics path.
AI Metrics Implementation
metrics/ai_metrics.go
New metrics package registering Prometheus vectors/histograms for LLM usage (requests, tokens, TTFT, TPOT, upstream duration, rate limits, circuit breaker, gateway duration) and exported recorder functions (e.g., RecordAIMetrics, RecordRateLimit).
Service integrations — Quota & Billing
service/quota.go, service/text_quota.go, service/task_billing.go
Added calls to metrics.RecordAIMetrics(...) from WebSocket/text quota flows and LogTaskConsumption to record AI usage metrics after consume-log or billing events.

Sequence Diagram

sequenceDiagram
    autonumber
    actor Client
    participant Gin as "Gin\nServer"
    participant PromM as "Prometheus\nMiddleware"
    participant Handler as "Request\nHandler"
    participant Quota as "Quota\nService"
    participant Billing as "Task\nBilling"
    participant Metrics as "Metrics\nRecorder"
    participant Scrape as "/metrics\nEndpoint"

    Client->>Gin: HTTP request
    Gin->>PromM: pass through middleware
    PromM->>PromM: record HTTP metrics
    PromM->>Handler: next()
    Handler->>Quota: consume / record consume log
    Quota->>Metrics: RecordAIMetrics(usage)
    Handler->>Billing: log task consumption
    Billing->>Metrics: RecordAIMetrics(nil)
    Handler-->>Client: response

    Client->>Scrape: GET /metrics (scrape)
    Scrape->>Metrics: promhttp handler serves metrics
    Metrics-->>Scrape: metrics payload
    Scrape-->>Client: Prometheus format
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I hopped through lines of midnight code,

Counting tokens down the road,
Prometheus bells begin to chime,
Dashboards hum in tidy rhyme,
Hop, hop — metrics saved in time.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 56.25% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: adding built-in Prometheus metrics for HTTP and LLM observability, which aligns with the changeset's primary focus.
Linked Issues check ✅ Passed All coding requirements from #2402 are met: Prometheus metrics support is implemented with metrics for model request volumes, HTTP golden signals, and LLM-specific metrics (token counts, request counts, latencies, TTFT) enabling Prometheus + Grafana monitoring.
Out of Scope Changes check ✅ Passed All changes are in scope: new metrics modules, middleware integration, metrics recording in quota/billing services, and metrics endpoint setup align with the #2402 objective of implementing built-in Prometheus metrics.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
service/quota.go (1)

266-348: ⚠️ Potential issue | 🟡 Minor

PostAudioConsumeQuota is missing metrics recording.

PostTextConsumeQuota and PostWssConsumeQuota now record AI metrics after quota consumption, but PostAudioConsumeQuota does not. This creates inconsistent observability coverage for audio-based LLM requests.

Consider adding the same metrics.RecordAIMetrics(relayInfo, usage) call after model.RecordConsumeLog (around line 347) for completeness.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@service/quota.go` around lines 266 - 348, PostAudioConsumeQuota does not
record AI metrics like the text and WSS paths do; after the call to
model.RecordConsumeLog in PostAudioConsumeQuota add a call to
metrics.RecordAIMetrics(relayInfo, usage) (or the same metrics function used by
PostTextConsumeQuota/PostWssConsumeQuota) to capture metrics for audio requests,
and ensure the metrics package is imported and any required context or
parameters match the other implementations so compilation and telemetry behavior
are consistent.
🧹 Nitpick comments (2)
service/quota.go (1)

238-242: Inconsistent dto.Usage construction compared to PostTextConsumeQuota.

PostTextConsumeQuota passes the full upstream usage object to metrics.RecordAIMetrics, while this function constructs a minimal dto.Usage with only three fields. If RecordAIMetrics uses other fields like InputTokens, OutputTokens, or token detail fields, the WebSocket path will record incomplete/zeroed metrics.

Consider using usage.InputTokens and usage.OutputTokens directly for consistency:

♻️ Proposed fix for consistent usage construction
 	metrics.RecordAIMetrics(relayInfo, &dto.Usage{
-		PromptTokens:     usage.InputTokens,
-		CompletionTokens: usage.OutputTokens,
-		TotalTokens:      usage.TotalTokens,
+		PromptTokens:     usage.InputTokens,
+		CompletionTokens: usage.OutputTokens,
+		TotalTokens:      usage.TotalTokens,
+		InputTokens:      usage.InputTokens,
+		OutputTokens:     usage.OutputTokens,
 	})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@service/quota.go` around lines 238 - 242, The WebSocket path constructs a
minimal dto.Usage (setting only PromptTokens, CompletionTokens, TotalTokens)
before calling metrics.RecordAIMetrics, causing missing fields compared to the
PostTextConsumeQuota path; instead pass the full upstream usage object (the
existing usage variable) to metrics.RecordAIMetrics (use the same pattern as
PostTextConsumeQuota) or populate all fields from usage (e.g., InputTokens,
OutputTokens, TokenDetails) so RecordAIMetrics receives consistent, complete
data; update the call that currently uses dto.Usage {...} to use usage directly
or fully mirror usage's fields.
router/main.go (1)

17-17: Consider access control for the metrics endpoint.

The /metrics endpoint is publicly accessible without authentication. While this is common practice for Prometheus scraping (and scrape configs typically use network-level controls), exposing metrics can reveal operational details (request volumes, model usage patterns, channel IDs) that may be sensitive in some deployments.

Consider adding an optional configuration flag to:

  1. Disable the endpoint entirely in production if not needed
  2. Require basic auth or a bearer token for scraping
  3. Bind to a separate internal port

This is not blocking if your deployment model assumes network-level protection.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@router/main.go` at line 17, The /metrics route is currently registered
unconditionally via router.GET("/metrics", middleware.MetricsHandler()), so add
configuration-driven access control: introduce a config flag (e.g.,
EnableMetrics or MetricsAuthToken) and only register the route when enabled, or
wrap MetricsHandler() with an auth middleware that validates Basic auth or a
bearer token (or bind metrics to a separate internal listener if your app
supports that); update the router setup to check the new config and either skip
registering router.GET("/metrics", ...) or replace the handler with an
authenticated version that checks the configured token before calling
middleware.MetricsHandler().
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@service/text_quota.go`:
- Around line 428-429: LogTaskConsumption in service/task_billing.go is missing
a call to metrics.RecordAIMetrics, causing task-based requests to skip token
metrics; update LogTaskConsumption to call metrics.RecordAIMetrics(relayInfo,
usage) after computing the usage object (the same way
PostTextConsumeQuota/PostWssConsumeQuota do), ensuring you pass the same
relayInfo and the usage struct calculated in LogTaskConsumption so
task/image/audio/video requests are recorded for AI token metrics.
- Around line 428-429: The call to metrics.RecordAIMetrics in
service/text_quota.go references a non-existent function and package; either
implement the function in the metrics package with the appropriate signature
(e.g., RecordAIMetrics(relayInfo, usage) exported) and restore the
"github.com/QuantumNous/new-api/metrics" import, or remove the call and the
unused metrics import from service/text_quota.go so the code compiles; update
any callers or tests that expect AI metrics accordingly.

---

Outside diff comments:
In `@service/quota.go`:
- Around line 266-348: PostAudioConsumeQuota does not record AI metrics like the
text and WSS paths do; after the call to model.RecordConsumeLog in
PostAudioConsumeQuota add a call to metrics.RecordAIMetrics(relayInfo, usage)
(or the same metrics function used by PostTextConsumeQuota/PostWssConsumeQuota)
to capture metrics for audio requests, and ensure the metrics package is
imported and any required context or parameters match the other implementations
so compilation and telemetry behavior are consistent.

---

Nitpick comments:
In `@router/main.go`:
- Line 17: The /metrics route is currently registered unconditionally via
router.GET("/metrics", middleware.MetricsHandler()), so add configuration-driven
access control: introduce a config flag (e.g., EnableMetrics or
MetricsAuthToken) and only register the route when enabled, or wrap
MetricsHandler() with an auth middleware that validates Basic auth or a bearer
token (or bind metrics to a separate internal listener if your app supports
that); update the router setup to check the new config and either skip
registering router.GET("/metrics", ...) or replace the handler with an
authenticated version that checks the configured token before calling
middleware.MetricsHandler().

In `@service/quota.go`:
- Around line 238-242: The WebSocket path constructs a minimal dto.Usage
(setting only PromptTokens, CompletionTokens, TotalTokens) before calling
metrics.RecordAIMetrics, causing missing fields compared to the
PostTextConsumeQuota path; instead pass the full upstream usage object (the
existing usage variable) to metrics.RecordAIMetrics (use the same pattern as
PostTextConsumeQuota) or populate all fields from usage (e.g., InputTokens,
OutputTokens, TokenDetails) so RecordAIMetrics receives consistent, complete
data; update the call that currently uses dto.Usage {...} to use usage directly
or fully mirror usage's fields.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1859a035-a3a5-467e-9ab9-8596b692f610

📥 Commits

Reviewing files that changed from the base of the PR and between dbf900a and 392ec17.

📒 Files selected for processing (4)
  • main.go
  • router/main.go
  • service/quota.go
  • service/text_quota.go

Comment on lines +428 to +429

metrics.RecordAIMetrics(relayInfo, usage)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check all quota consumption functions to verify metrics coverage
rg -n 'RecordConsumeLog' --type go -A 3 | head -80

# Check if LogTaskConsumption has metrics call
rg -n -B 5 -A 30 'func LogTaskConsumption' --type go

Repository: QuantumNous/new-api

Length of output: 5172


🏁 Script executed:

# Search for all RecordAIMetrics calls
rg -n 'RecordAIMetrics' --type go -B 2 -A 2

# Check PostTextConsumeQuota and PostWssConsumeQuota
rg -n -B 5 -A 35 'func PostTextConsumeQuota|func PostWssConsumeQuota' --type go

Repository: QuantumNous/new-api

Length of output: 6729


🏁 Script executed:

# Get the complete PostWssConsumeQuota function (from line 158 onwards)
sed -n '158,250p' service/quota.go

# Get complete context around RecordAIMetrics at quota.go:238
sed -n '220,250p' service/quota.go

# Get full LogTaskConsumption function
sed -n '19,68p' service/task_billing.go

Repository: QuantumNous/new-api

Length of output: 6705


Add metrics recording to LogTaskConsumption for observability parity.

PostTextConsumeQuota and PostWssConsumeQuota both call metrics.RecordAIMetrics, but LogTaskConsumption in service/task_billing.go (lines 19-68) does not. Task-based requests (image generation, audio processing, video tasks) will lack token metrics recording, creating an observability gap for monitoring LLM usage.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@service/text_quota.go` around lines 428 - 429, LogTaskConsumption in
service/task_billing.go is missing a call to metrics.RecordAIMetrics, causing
task-based requests to skip token metrics; update LogTaskConsumption to call
metrics.RecordAIMetrics(relayInfo, usage) after computing the usage object (the
same way PostTextConsumeQuota/PostWssConsumeQuota do), ensuring you pass the
same relayInfo and the usage struct calculated in LogTaskConsumption so
task/image/audio/video requests are recorded for AI token metrics.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if metrics.RecordAIMetrics handles nil usage
ast-grep --pattern $'func RecordAIMetrics($_, $_) {
  $$$
}'

# Also search for nil checks in the function
rg -n -A 20 'func RecordAIMetrics' --type go

Repository: QuantumNous/new-api

Length of output: 45


🏁 Script executed:

#!/bin/bash
# Search for RecordAIMetrics in all Go files
rg -l 'RecordAIMetrics' --type go

# Also search broadly for where metrics package defines functions
rg 'func.*RecordAIMetrics' --type go -A 5

# Find the metrics package location
fd -type f -name 'metrics*.go' | head -20

Repository: QuantumNous/new-api

Length of output: 289


🏁 Script executed:

#!/bin/bash
# Read the relevant section of service/text_quota.go
wc -l service/text_quota.go

Repository: QuantumNous/new-api

Length of output: 88


🏁 Script executed:

#!/bin/bash
# Read the relevant section of service/text_quota.go around line 428-429
sed -n '410,430p' service/text_quota.go

# Also look for imports and earlier context
sed -n '1,50p' service/text_quota.go

Repository: QuantumNous/new-api

Length of output: 2240


🏁 Script executed:

#!/bin/bash
# Search for RecordAIMetrics definition across all files
rg 'RecordAIMetrics' -A 15 --max-count=5

Repository: QuantumNous/new-api

Length of output: 986


🏁 Script executed:

#!/bin/bash
# Find the metrics package
fd -type f -path '*/metrics/*' -name '*.go' | head -20

Repository: QuantumNous/new-api

Length of output: 233


🏁 Script executed:

#!/bin/bash
# Search for RecordAIMetrics definition in metrics package
find . -path '*/metrics*' -name '*.go' -exec grep -l 'RecordAIMetrics' {} \;

Repository: QuantumNous/new-api

Length of output: 45


🏁 Script executed:

#!/bin/bash
# Read context before line 428 to understand usage variable
sed -n '280,310p' service/text_quota.go

Repository: QuantumNous/new-api

Length of output: 1708


🏁 Script executed:

#!/bin/bash
# Find metrics package files
find . -type d -name 'metrics' | head -10

Repository: QuantumNous/new-api

Length of output: 45


🏁 Script executed:

#!/bin/bash
# Locate all files and search for RecordAIMetrics
find . -type f -name '*.go' | xargs grep -l 'func RecordAIMetrics' 2>/dev/null

Repository: QuantumNous/new-api

Length of output: 47


🏁 Script executed:

#!/bin/bash
# Search more broadly for RecordAIMetrics
rg 'RecordAIMetrics' --max-count=20

Repository: QuantumNous/new-api

Length of output: 192


🏁 Script executed:

#!/bin/bash
# Check if there's a metrics directory or file
ls -la | grep -i metric
find . -type d -iname '*metric*' 2>/dev/null
find . -type f -iname '*metric*' -name '*.go' 2>/dev/null | head -20

Repository: QuantumNous/new-api

Length of output: 45


🏁 Script executed:

#!/bin/bash
# Check what's being imported from metrics in the files
grep -r 'from.*metrics' service/*.go 2>/dev/null || grep 'metrics' service/text_quota.go | head -5

# Look for all files that import metrics
rg 'github.com/QuantumNous/new-api/metrics' --max-count=20

Repository: QuantumNous/new-api

Length of output: 270


🏁 Script executed:

#!/bin/bash
# Look for any file that might contain metrics functions
find . -type f -name '*.go' -exec grep -l 'func.*Record' {} \; | head -20

Repository: QuantumNous/new-api

Length of output: 270


🏁 Script executed:

#!/bin/bash
# Check all imports in service/text_quota.go
sed -n '1,30p' service/text_quota.go | grep -E '(import|metrics)'

Repository: QuantumNous/new-api

Length of output: 113


🏁 Script executed:

#!/bin/bash
# List all Go files to get a sense of the repository structure
git ls-files | grep '\.go$' | grep -i metric

Repository: QuantumNous/new-api

Length of output: 45


🏁 Script executed:

#!/bin/bash
# Check entire go file list
git ls-files | grep '\.go$' | wc -l

# Look for any metrics-related files
git ls-files | grep -E '(metric|telemetry|observ)'

Repository: QuantumNous/new-api

Length of output: 66


🏁 Script executed:

#!/bin/bash
# Search for the actual definition of RecordAIMetrics with broader search
git grep 'RecordAIMetrics' 2>/dev/null || rg 'RecordAIMetrics' --type-list

Repository: QuantumNous/new-api

Length of output: 192


🏁 Script executed:

#!/bin/bash
# Check if metrics package directory exists anywhere
find . -type d -name 'metrics' 2>/dev/null

Repository: QuantumNous/new-api

Length of output: 45


🏁 Script executed:

#!/bin/bash
# Search for files containing 'metrics' in their path
git ls-files 2>/dev/null | grep metrics || echo "No metrics files found"

Repository: QuantumNous/new-api

Length of output: 85


🏁 Script executed:

#!/bin/bash
# Look at the go.mod to see if metrics is external
head -50 go.mod 2>/dev/null || echo "No go.mod found"

Repository: QuantumNous/new-api

Length of output: 1873


🏁 Script executed:

#!/bin/bash
# Try broader pattern - look for any file that has the word "RecordAIMetrics" or similar patterns
rg 'Record.*Metrics|RecordMetrics' --max-count=20

Repository: QuantumNous/new-api

Length of output: 192


🏁 Script executed:

#!/bin/bash
# Search for any definition of metrics package or RecordAIMetrics
rg 'package metrics' --max-count=10
rg 'var.*Metrics' --max-count=10

Repository: QuantumNous/new-api

Length of output: 45


🏁 Script executed:

#!/bin/bash
# Check if there's a metrics folder at repo root level
ls -la | head -20
tree -L 2 -d 2>/dev/null | head -30 || find . -maxdepth 2 -type d | grep -v '^\.$' | sort | head -20

Repository: QuantumNous/new-api

Length of output: 1631


🏁 Script executed:

#!/bin/bash
# Search for where usage variable comes from and what type it is
sed -n '290,430p' service/text_quota.go | head -60

Repository: QuantumNous/new-api

Length of output: 3821


🏁 Script executed:

#!/bin/bash
# Final confirmation: check if metrics package could be created in this PR
# Search for 'metrics' in any files to see if it's being added
rg 'metrics' --type-list 2>/dev/null | head -5

# Look at PR changes more carefully - is metrics being created?
git ls-files | grep metrics || echo "No metrics files in git"

# Check if maybe the metrics package exists but is empty
find . -name "*.go" -path "*/metrics/*" 2>/dev/null || echo "No metrics go files found"

Repository: QuantumNous/new-api

Length of output: 86


Remove call to undefined metrics.RecordAIMetrics function.

The metrics.RecordAIMetrics function does not exist in the repository. The import "github.com/QuantumNous/new-api/metrics" references a non-existent package, and the call at line 428 will fail to compile. Either the metrics package needs to be implemented with this function, or this call should be removed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@service/text_quota.go` around lines 428 - 429, The call to
metrics.RecordAIMetrics in service/text_quota.go references a non-existent
function and package; either implement the function in the metrics package with
the appropriate signature (e.g., RecordAIMetrics(relayInfo, usage) exported) and
restore the "github.com/QuantumNous/new-api/metrics" import, or remove the call
and the unused metrics import from service/text_quota.go so the code compiles;
update any callers or tests that expect AI metrics accordingly.

Address CodeRabbit review feedback: LogTaskConsumption (image generation,
audio processing, video tasks) was missing metrics.RecordAIMetrics call,
creating an observability gap for task-based requests.

The call passes nil usage since task-based billing is per-invocation
rather than per-token, but still records request count and duration.
@RichardoMrMu
Copy link
Author

Thanks for the review @coderabbitai! Both issues have been addressed:

🟡 LogTaskConsumption missing metrics — Fixed in cc0a5ba. Added metrics.RecordAIMetrics(info, nil) to LogTaskConsumption in service/task_billing.go. Since task-based billing (image generation, audio, video) is per-invocation rather than per-token, usage is passed as nilRecordAIMetrics handles this gracefully by recording request count and duration while skipping token counters.

🔴 "Undefined metrics.RecordAIMetrics" — This is a false positive. The function is defined at metrics/ai_metrics.go:150 and the package already exists in the repository. The review bot likely only analyzed the 4 files in the PR diff without seeing the existing metrics/ package.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 25, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify the fix in service/task_billing.go
cat service/task_billing.go | grep -n "RecordAIMetrics" -A3 -B3

# Verify RecordAIMetrics is defined in metrics/ai_metrics.go
grep -n "func RecordAIMetrics" metrics/ai_metrics.go

# Check how RecordAIMetrics handles nil usage
sed -n '145,185p' metrics/ai_metrics.go

Length of output: 404


🏁 Script executed:

#!/bin/bash
# Search broadly for RecordAIMetrics definition
rg -n "func RecordAIMetrics"

# Find all files in the metrics package/directory
fd . --type f | grep -i metric

# Check if there's a different path for the metrics package
find . -name "*.go" | xargs grep -l "RecordAIMetrics" 2>/dev/null

Length of output: 68


@RichardoMrMu: Thank you for the detailed update! The fix in service/task_billing.go (line 63) is confirmed — metrics.RecordAIMetrics(info, nil) is correctly wired in LogTaskConsumption. 👍

However, I need to flag a real build-breaking issue regarding the second point. After searching the entire PR branch:

  • func RecordAIMetrics has zero definitions anywhere in the repository — rg -n "func RecordAIMetrics" returned no matches.
  • metrics/ai_metrics.go does not existfd finds no files with "metric" in their name at all.
  • The only files referencing RecordAIMetrics are the caller files: service/quota.go, service/task_billing.go, and service/text_quota.go.

This means the codebase will fail to compile as-is. The PR description mentions metrics/ai_metrics.go as a new file already in the repo, but it appears it was never committed to this branch. The metrics/ package — including the RecordAIMetrics function definition and all the Prometheus metric registrations — needs to be pushed to the branch for the PR to be buildable.

Could you double-check with git status or git log --oneline that metrics/ai_metrics.go (and middleware/metrics.go) are actually committed and pushed to the feat/prometheus-metrics branch?

These files were present locally but never committed to the branch,
causing compilation failures. This commit adds:

- metrics/ai_metrics.go: LLM metrics definitions and RecordAIMetrics()
- middleware/metrics.go: HTTP Prometheus middleware and MetricsHandler()

Both files are required by the integration points in main.go,
router/main.go, service/quota.go, service/text_quota.go, and
service/task_billing.go.
@RichardoMrMu
Copy link
Author

You are absolutely right — apologies for the confusion! The metrics/ai_metrics.go and middleware/metrics.go files were present in my local working directory but were never staged/committed to the branch (they were in Untracked files status).

This has now been fixed in commit 426cc06b:

$ git status
 2 files changed, 334 insertions(+)
 create mode 100644 metrics/ai_metrics.go
 create mode 100644 middleware/metrics.go

The branch should now compile correctly with all metric definitions in place. Thank you for catching this critical issue! 🙏

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
middleware/metrics.go (1)

94-99: Consider protecting /metrics in production deployments.

Serving raw Prometheus metrics directly is fine for internal networks, but internet-exposed deployments should gate this endpoint (allowlist/auth/reverse-proxy ACL).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@middleware/metrics.go` around lines 94 - 99, The MetricsHandler currently
exposes promhttp.Handler() unconditionally; update MetricsHandler to gate access
in production by checking a configured policy before calling handler.ServeHTTP —
e.g., read an allowlist or auth flag from configuration (env var or app config),
verify client IP via c.ClientIP() against an allowlist or enforce an auth
header/basic token via c.GetHeader()/c.Request, and return
c.AbortWithStatus(403) when unauthorized; modify the function that constructs
the handler (MetricsHandler) to perform this check and only call
handler.ServeHTTP(c.Writer, c.Request) when the request passes the configured
allowlist/auth check.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@metrics/ai_metrics.go`:
- Around line 45-51: The code is exposing raw token keys via the "token_name"
metric label (relayInfo.TokenKey) causing secret leakage and unbounded
cardinality; remove "token_name" from llmTokenLabelNames and any counters/gauges
that use it, and replace the mapped value with a safe, low-cardinality
alternative (e.g., a token_type, provider identifier, a boolean like
"has_token", or a fixed hash/salt-and-hash function) wherever relayInfo.TokenKey
is currently assigned; update the metric registration (variables referencing
llmTokenLabelNames and any counters using that label) and the code path that
sets token label (referencing relayInfo.TokenKey and "token_name") to emit the
new safe label instead.
- Around line 185-193: The code computes durations from relayInfo.StartTime
without checking it; guard against a zero StartTime before computing
serviceDuration and ttft by verifying relayInfo.StartTime.IsZero() and skipping
observation when it is zero. Specifically, in the block that calculates
serviceDuration (variable serviceDuration and metric llmServiceDuration) and the
TTFT logic that uses relayInfo.FirstResponseTime and llmFirstTokenDuration, add
a check for relayInfo.StartTime.IsZero() and only call time.Since(...) /
FirstResponseTime.Sub(...) and observe metrics when StartTime is non-zero (and
ttft > 0 as already done).

---

Nitpick comments:
In `@middleware/metrics.go`:
- Around line 94-99: The MetricsHandler currently exposes promhttp.Handler()
unconditionally; update MetricsHandler to gate access in production by checking
a configured policy before calling handler.ServeHTTP — e.g., read an allowlist
or auth flag from configuration (env var or app config), verify client IP via
c.ClientIP() against an allowlist or enforce an auth header/basic token via
c.GetHeader()/c.Request, and return c.AbortWithStatus(403) when unauthorized;
modify the function that constructs the handler (MetricsHandler) to perform this
check and only call handler.ServeHTTP(c.Writer, c.Request) when the request
passes the configured allowlist/auth check.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 262aa856-4130-4fec-834f-0961ea61005f

📥 Commits

Reviewing files that changed from the base of the PR and between cc0a5ba and 426cc06.

📒 Files selected for processing (2)
  • metrics/ai_metrics.go
  • middleware/metrics.go

Comment on lines +45 to +51
"model", "channel", "upstream_model", "status", "error_type",
"region", "is_stream", "token_name",
}

var llmTokenLabelNames = []string{
"model", "channel", "upstream_model", "region", "token_name",
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Remove raw token keys from metric labels (secret leakage + cardinality blow-up).

Line 164-Line 166 maps relayInfo.TokenKey into token_name, and that label is used in counters on Line 170-Line 179. This can expose credentials in /metrics and create unbounded label cardinality.

Suggested fix
 var llmRequestLabelNames = []string{
 	"model", "channel", "upstream_model", "status", "error_type",
-	"region", "is_stream", "token_name",
+	"region", "is_stream",
 }

 var llmTokenLabelNames = []string{
-	"model", "channel", "upstream_model", "region", "token_name",
+	"model", "channel", "upstream_model", "region",
 }
@@
-	tokenName := ""
-	if relayInfo.TokenKey != "" {
-		tokenName = relayInfo.TokenKey
-	}
 	isStream := strconv.FormatBool(relayInfo.IsStream)
@@
 	llmRequestTotal.WithLabelValues(
 		model, channel, upstreamModel, status, errorType,
-		region, isStream, tokenName,
+		region, isStream,
 	).Inc()
@@
-		tokenLabels := []string{model, channel, upstreamModel, region, tokenName}
+		tokenLabels := []string{model, channel, upstreamModel, region}
 		llmInputTokenTotal.WithLabelValues(tokenLabels...).Add(float64(usage.PromptTokens))
 		llmOutputTokenTotal.WithLabelValues(tokenLabels...).Add(float64(usage.CompletionTokens))
 	}

Also applies to: 163-166, 170-179

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@metrics/ai_metrics.go` around lines 45 - 51, The code is exposing raw token
keys via the "token_name" metric label (relayInfo.TokenKey) causing secret
leakage and unbounded cardinality; remove "token_name" from llmTokenLabelNames
and any counters/gauges that use it, and replace the mapped value with a safe,
low-cardinality alternative (e.g., a token_type, provider identifier, a boolean
like "has_token", or a fixed hash/salt-and-hash function) wherever
relayInfo.TokenKey is currently assigned; update the metric registration
(variables referencing llmTokenLabelNames and any counters using that label) and
the code path that sets token label (referencing relayInfo.TokenKey and
"token_name") to emit the new safe label instead.

Comment on lines +185 to +193
serviceDuration := time.Since(relayInfo.StartTime).Seconds()
llmServiceDuration.WithLabelValues(latencyLabels...).Observe(serviceDuration)

// Time-to-first-token (only meaningful when FirstResponseTime was recorded)
if !relayInfo.FirstResponseTime.IsZero() {
ttft := relayInfo.FirstResponseTime.Sub(relayInfo.StartTime).Seconds()
if ttft > 0 {
llmFirstTokenDuration.WithLabelValues(latencyLabels...).Observe(ttft)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard zero StartTime before duration calculations.

Line 185 computes time.Since(relayInfo.StartTime) unconditionally. If StartTime is zero, duration/TTFT values become invalid and heavily skew histograms.

Suggested fix
 	latencyLabels := []string{model, channel, region}

+	if relayInfo.StartTime.IsZero() {
+		return
+	}
+
 	// Service duration (total time from request start to now)
 	serviceDuration := time.Since(relayInfo.StartTime).Seconds()
 	llmServiceDuration.WithLabelValues(latencyLabels...).Observe(serviceDuration)

 	// Time-to-first-token (only meaningful when FirstResponseTime was recorded)
-	if !relayInfo.FirstResponseTime.IsZero() {
+	if !relayInfo.FirstResponseTime.IsZero() && relayInfo.FirstResponseTime.After(relayInfo.StartTime) {
 		ttft := relayInfo.FirstResponseTime.Sub(relayInfo.StartTime).Seconds()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@metrics/ai_metrics.go` around lines 185 - 193, The code computes durations
from relayInfo.StartTime without checking it; guard against a zero StartTime
before computing serviceDuration and ttft by verifying
relayInfo.StartTime.IsZero() and skipping observation when it is zero.
Specifically, in the block that calculates serviceDuration (variable
serviceDuration and metric llmServiceDuration) and the TTFT logic that uses
relayInfo.FirstResponseTime and llmFirstTokenDuration, add a check for
relayInfo.StartTime.IsZero() and only call time.Since(...) /
FirstResponseTime.Sub(...) and observe metrics when StartTime is non-zero (and
ttft > 0 as already done).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

支持metrics,实现prometheus+grafna监控关键指标

1 participant