Scout Ops is a framework for managing API alerts as code. Define your APIs and their SLO thresholds in simple YAML files organized by team, and the pipeline takes care of validation, generation, and deployment.
Managing alerts for API endpoints manually doesn't scale. As the number of endpoints grows, keeping alert rules consistent, up-to-date, and properly configured across environments becomes error-prone and time-consuming. Changes require navigating UIs, copy-pasting configurations, and hoping nothing gets missed.
Scout Ops treats alert definitions as code:
- Team-based YAML specs define APIs and their SLO thresholds, organized by team
- A global SLA file provides default thresholds that teams can override
- A build pipeline validates specs, generates alert rules, and deploys them
- A single global config controls datasource settings and query templates
specs/ alerts/
sla.yaml (global defaults) validate -> build -> render -> deploy
teams/
payments/ |
api.yaml ──────────────>
sla.yaml ──────────────> resources.json -> alerting platform
integrations/
api.yaml ──────────────>
sla.yaml ──────────────>
- Define team API specs in
specs/teams/<team>/api.yamland SLO overrides inspecs/teams/<team>/sla.yaml - Validate specs for correctness (required fields, duplicates, format)
- Build JSON inputs by joining APIs with SLAs and resolving fallbacks against
specs/sla.yaml - Render Jsonnet templates into deployable alert resources
- Deploy via
grr diff(preview) orgrr apply(deploy)
cd alerts
make config # one-time: configure platform credentials
make generate # validate + build + render
make diff # preview what would change
make apply # deploy to alerting platformDefine global SLO defaults:
# specs/sla.yaml
apis:
- name: "*"
slo:
error_rate:
threshold: 5
operator: ">"
unit: percent
window: 5m
alert: true
latency:
threshold: 1000
operator: ">"
unit: ms
window: 5m
alert: trueAdd a team with an API:
# specs/teams/payments/api.yaml
apis:
- name: v1-payments-initiate-post
methods:
- POST
paths:
- /v1/payments/initiate
service:
name: payment-service
tags:
team: payments
tier: critical
product: paymentsOptionally override SLOs for specific APIs:
# specs/teams/payments/sla.yaml
apis:
- name: v1-payments-initiate-post
slo:
latency:
threshold: 800
operator: ">"
unit: ms
window: 5m
alert: trueThis generates alert rules like v1-payments-initiate-post latency above 800ms and v1-payments-initiate-post error rate above 5% (from the global default) -- with queries, labels, annotations, and notification settings all pulled from global config.
| Document | Description |
|---|---|
| Getting Started | Setup, prerequisites, and first deployment |
| Configuration | Global config, datasource setup, and workflow configuration |
| Spec Reference | API spec, SLA spec, and SLA resolution rules |
| Examples | Common use cases and patterns |
- Jsonnet — Data templating language