Skip to content

feat(metrics): add Prometheus per-tool metrics and ServiceMonitor#39

Open
yodem wants to merge 2 commits intomainfrom
feature/sc-41316/prometheus-metrics-and-servicemonitor
Open

feat(metrics): add Prometheus per-tool metrics and ServiceMonitor#39
yodem wants to merge 2 commits intomainfrom
feature/sc-41316/prometheus-metrics-and-servicemonitor

Conversation

@yodem
Copy link
Contributor

@yodem yodem commented Feb 19, 2026

Summary

Implements SC-41316: Add per-tool time-series metrics for the chatbot agent so usage can be graphed over time in Prometheus/Grafana.

Changes

  • Metrics module (server/chat/metrics.py): chatbot_tool_calls_total, chatbot_tool_duration_seconds, chatbot_tool_errors_total
  • Instrumentation: Tool handler in claude_service._build_sdk_tools records metrics on each tool invocation
  • Endpoint: GET /api/metrics exposes Prometheus text format
  • ServiceMonitor: manifests/servicemonitor.yaml for Prometheus Operator (scrapes /api/metrics)
  • Tests: server/chat/tests/test_metrics.py
  • Docs: docs/plans/metrics-for-chatbot-agent-tools.md, manifests/README.md

ServiceMonitor requirements

The target Service must have:

  • Label app.kubernetes.io/name: ai-chatbot
  • Port named http

See manifests/README.md for details.

Fixes SC-41316

- Add chatbot_tool_calls_total, chatbot_tool_duration_seconds, chatbot_tool_errors_total
- Instrument tool handler in claude_service._build_sdk_tools
- Expose GET /api/metrics endpoint for Prometheus scraping
- Add ServiceMonitor manifest for Prometheus Operator
- Add prometheus-client dependency and tests

Fixes SC-41316

Co-authored-by: Cursor <cursoragent@cursor.com>
@shortcut-integration
Copy link

This pull request has been linked to Shortcut Story #41316: MCP: add per-tool time-series metrics.

@coolify-sefaria-github
Copy link

coolify-sefaria-github bot commented Feb 19, 2026

The preview deployment for sefaria/ai-chatbot:client is ready. 🟢

Open Preview | Open Build Logs | Open Application Logs

Last updated at: 2026-02-22 09:45:01 CET

- Introduced a new document outlining Prometheus metrics for tracking chatbot tool usage.
- Detailed metrics include `chatbot_tool_calls_total`, `chatbot_tool_duration_seconds`, and `chatbot_tool_errors_total`.
- Discussed implementation options and potential gaps in current observability.
- Aimed to support time-series queries for better monitoring and analysis.

Relates to SC-41316
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant