Skip to content

Add Prometheus /metrics endpoint for inference latency histogram #8

@raullenchai

Description

@raullenchai

Summary

Add a /metrics endpoint that exposes Prometheus-compatible metrics for monitoring inference performance in production.

Details

trio-core's FastAPI server (src/trio_core/api/server.py) currently has no observability endpoints. Adding Prometheus metrics would let users monitor inference latency, request counts, and error rates using standard tools (Grafana, Datadog, etc).

What to Implement

  1. Add prometheus_client as an optional dependency in pyproject.toml (under an [observability] extra)
  2. Create a /metrics endpoint in the API server
  3. Expose at minimum:
    • trio_request_latency_seconds — Histogram for request duration, labeled by endpoint
    • trio_requests_total — Counter for total requests, labeled by endpoint and status
    • trio_inference_latency_seconds — Histogram specifically for model inference time
    • trio_active_requests — Gauge for currently-in-flight requests

Acceptance Criteria

  • GET /metrics returns Prometheus text format
  • Latency histogram has reasonable buckets (e.g., 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0 seconds)
  • Metrics are only enabled when prometheus_client is installed (graceful degradation)
  • At least one test verifying /metrics returns 200 with expected metric names

Files to Modify

  • pyproject.toml — add optional dependency
  • src/trio_core/api/server.py — add middleware and endpoint
  • tests/test_api.py — add test

Resources

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions