Otel sources#47
Merged
Merged
Conversation
Add migration 034 with health_endpoint_format discriminator on services table (default/schema/prometheus/otlp) and team_api_keys table for OTLP push authentication. Backfills existing services with schema_config to schema format. Adds HealthEndpointFormat, TeamApiKey, and CreateTeamApiKeyInput types, extends audit action/resource types for API key operations, and updates store input types.
Implement team API key store infrastructure for OTLP push authentication: - ITeamApiKeyStore interface with findByTeamId, findByKeyHash, create, delete, updateLastUsed - TeamApiKeyStore implementation with dps_ prefixed key generation and SHA-256 hashing - Register store in StoreRegistry - Unit tests covering all CRUD operations
API key authentication middleware for OTLP push endpoints: - Validates Authorization: Bearer dps_... header format - Hashes key and looks up by key_hash in team_api_keys table - Sets req.apiKeyTeamId on success, updates last_used_at - Returns 401 for missing, malformed, or invalid keys
Team API key management endpoints mounted on team router: - GET /api/teams/:id/api-keys (list, team lead/admin, strips key_hash) - POST /api/teams/:id/api-keys (create, returns raw key once) - DELETE /api/teams/:id/api-keys/:keyId (revoke with ownership check) - Audit logging on create and revoke - Route-level tests covering RBAC, validation, and audit events
Add health_endpoint_format selector to ServiceForm with conditional field visibility (OTLP hides endpoint URL, schema shows editor). Add format badge to ServiceDetail, API key CRUD client functions, ApiKeys component with create/revoke/copy workflow and collector config snippet, and API Keys tab on TeamDetail for team leads/admins.
Document health_endpoint_format column, team_api_keys table, API key authentication, OTLP receiver endpoint, format-aware parser dispatch, Prometheus parsing, OTLP rate limiting, and ITeamApiKeyStore across all relevant spec files.
Add OTLP push ingestion, Prometheus scraping, and API key auth to README features and API table. Add OTLP rate limit env vars to .env.example.
Migration 035 adds rate_limit_rpm (nullable INTEGER) and rate_limit_admin_locked (INTEGER DEFAULT 0) to team_api_keys. Migration 036 creates api_key_usage_buckets table with composite PK (api_key_id, bucket_start, granularity) and supporting indexes for per-key time-series usage tracking.
- Extend TeamApiKey with rate_limit_rpm and rate_limit_admin_locked columns - Add ApiKeyUsageBucket interface for usage bucket rows - Add findById, updateRateLimit, setAdminLock to TeamApiKeyStore - Extend requireApiKeyAuth to set req.apiKeyId for downstream middleware
- Add IApiKeyUsageStore interface with bulkUpsert, getBuckets, getBucketsByTeam, getAllBuckets, getSummaryForKeys, and prune methods - Implement all methods using better-sqlite3 prepared statements and transactions - Register ApiKeyUsageStore in StoreRegistry
- Implement token bucket rate limiter with burst capacity, soft-limit warnings, OTLP-format 429 responses, and injectable getNow for testing - Implement usage accumulator with 5-second bulk flush to SQLite, minute+hour granularity, and rejected count tracking
- Add usage bucket pruning to DataRetentionService (minute=24h, hour=30d, orphaned=7d) - Rename createOtlpRateLimit to createOtlpGlobalRateLimit for clarity - Insert perKeyRateLimit and trackApiKeyUsage into /v1/metrics middleware chain
- PATCH /api/teams/:id/api-keys/:keyId/rate-limit (team lead, with lock check) - GET /api/teams/:id/api-keys/:keyId/usage (team lead, time-series buckets) - PATCH /api/admin/api-keys/:keyId/rate-limit (admin, with lock toggle) - GET /api/admin/api-keys/:keyId/usage (admin, per-key time-series) - GET /api/admin/otlp-usage (admin, cross-team hourly overview)
…ummaries - Add rate_limit_rpm, rate_limit_is_custom, rate_limit_admin_locked, usage_1h/24h/7d, and rejected_24h/7d to team and admin otlpStats responses - Uses batched getSummaryForKeys queries (3 per endpoint call) - Update test schemas with rate limit columns and usage buckets table
- Extend OtlpApiKeyStats with rate limit and usage summary fields - Add ApiKeyUsageBucket and ApiKeyUsageResponse types - Add API client functions for team/admin rate limit PATCH and usage GET endpoints
Add rate limit display (with custom/default label and lock indicator) to the API keys management table, and a modal edit dialog for team leads to update rate limits on unlocked keys.
…s, and usage charts Add per-key usage summary row (1h/24h/7d push counts, rejected warnings), rate limit display with lock indicator and edit dialog for team leads, expandable ApiKeyUsageChart per key (lazy-mounted), and warning badges for keys approaching or exceeding rate limits.
… admin rate limit controls Add cross-team usage overview section with 24h/7d push totals, rejection counts, and top-5 keys table. Extend per-team key cards with usage summary rows, rate limit display with lock indicators, expandable usage charts, and admin rate limit edit dialog with lock checkbox. Add amber/red card highlighting for keys with active rejections. Add AdminOtlpUsageResponse type and update API client return type.
…UsageStore, and TeamApiKeyStore
…eceiver per-key rate limiting
… and warning badges
…db/types.ts
Add foundational TypeScript types for trace-based dependency discovery:
- DiscoverySource type ('manual' | 'otlp_metric' | 'otlp_trace')
- Span and CreateSpanInput interfaces for full span storage
- ExternalNodeEnrichment and UpsertExternalNodeEnrichmentInput interfaces
- Extend Dependency with discovery_source, user_display_name/description/impact
- Extend DependencyAssociation with is_auto_suggested, is_dismissed
- Extend ProactiveDepsStatus.health with optional percentiles
- Update all test mocks and runtime code for new required fields
- 037: discovery_source, user enrichment columns on dependencies; re-add is_auto_suggested/is_dismissed on dependency_associations - 038: external_node_enrichment table for org-wide external node metadata - 039: percentile latency columns (p50/p95/p99/min/max/request_count/source) on dependency_latency_history - 040: spans table with indexes for trace correlation and timeline views - 041: app_settings table seeded with span_retention_days = 7 - Register all five migrations in migrate.ts
- Trace types: OtlpSpan, OtlpSpanStatus, OtlpScopeSpans, OtlpResourceSpans, OtlpExportTraceServiceRequest - Histogram types: OtlpHistogramDataPoint, OtlpHistogram - Sum types: OtlpSum (monotonic + non-monotonic) - Extend OtlpMetric with histogram? and sum? fields
New stores: - ISpanStore / SpanStore: bulkInsert, findByTraceId, findByServiceName, deleteOlderThan for full span storage - IAppSettingsStore / AppSettingsStore: get/set for admin-configurable app settings (e.g., span_retention_days) - IExternalNodeEnrichmentStore / ExternalNodeEnrichmentStore: CRUD for org-wide external node enrichment metadata Store extensions: - DependencyStore.upsert(): passes discovery_source through INSERT; preserves 'manual' on conflict (never downgrade to otlp_trace) - LatencyHistoryStore.recordWithPercentiles(): stores histogram-derived p50/p95/p99/min/max/requestCount with source tag - AssociationStore: create() supports is_auto_suggested flag; new findAutoSuggested(), confirm(), dismiss() methods All stores registered in StoreRegistry. Test inline schemas updated for new columns.
… extensions Migration tests (037-041): - Verify columns, defaults, indexes, FK constraints, cascade delete - Verify discovery_source backfill for OTLP services - Verify span_retention_days seeded default New store tests: - SpanStore: bulkInsert, findByTraceId ordering, findByServiceName with since/limit, deleteOlderThan - AppSettingsStore: get seeded value, get missing key, set create/update - ExternalNodeEnrichmentStore: upsert create/update, findByCanonicalName, findAll ordered, delete Extended store tests: - DependencyStore: discovery_source passthrough, manual default, manual preserved on conflict, otlp_metric upgradable to otlp_trace - LatencyHistoryStore: recordWithPercentiles stores all fields, partial percentile data, backward compat with record() - AssociationStore: create with is_auto_suggested, findAutoSuggested filters correctly, confirm/dismiss update flags
…cy discovery Parses OTLP trace payloads, extracting dependency information from CLIENT and PRODUCER spans. Implements target name resolution priority chain (peer.service → db.system → messaging.system → rpc.system → server.address → url.full hostname), dependency type inference, auto-description generation, and deduplication by target name with aggregated latency/error state.
…odule Move the inline findOrCreateService helper from the metrics OTLP route into a reusable module so the upcoming trace receiver route can share it.
POST /v1/traces endpoint receives OTLP trace payloads, stores ALL spans for future timeline views, and feeds CLIENT/PRODUCER spans through the TraceDependencyBridge into the existing dependency upsert pipeline. Mounted with identical middleware stack as /v1/metrics (2mb limit).
Tests cover: valid payload acceptance, auto-registration, CLIENT/PRODUCER dependency creation, ALL span storage, DB target resolution, server.address fallback, invalid payload 400, unauthorized 401, status change events, idempotent upsert, duration_ms calculation, attribute serialization, multi-service payloads, and span kind filtering.
- histogramPercentiles utility: linear interpolation from OTLP histogram buckets with auto seconds-to-ms conversion - OtlpParser: process histogram dataPoints for percentile extraction, sum dataPoints for gauge-like/counter values, buildDependency populates health.percentiles - DependencyUpsertService: route histogram percentiles to recordWithPercentiles() with otlp_histogram source - LatencyHistoryStore: bucket queries include avg_p50/p95/p99 - Tests: percentile utility, OtlpParser histogram/sum paths, DependencyUpsertService percentile recording; fix latency test schemas
…ciation Automatically link trace-discovered dependencies to registered services when an exact name match (case-insensitive) or canonical name match via alias resolution is found. Creates associations with is_auto_suggested=1, skips self-links and already-associated pairs (including dismissed), and catches UNIQUE constraint violations as no-ops for race-condition safety.
…s and external nodes Add confirm/dismiss endpoints for auto-suggested associations, PATCH enrichment for trace-discovered dependencies, GET discovered dependencies list, external node enrichment CRUD, and frontend API client functions.
Extend the dependency graph to visually distinguish auto-discovered (trace-based) dependencies from manually configured ones, and overlay enrichment metadata on external nodes. Backend: - Add discoverySource, isAutoSuggested, associationId to GraphEdgeData - Add discoveredDependencyCount to ServiceNodeData - Include discovery_source, is_auto_suggested, association_id in dependency queries (DependencyStore) - Populate new fields in DependencyGraphBuilder.createEdgeData() - Add ExternalNodeBuilder.applyEnrichment() for enrichment overlay - Wire ExternalNodeEnrichmentStore through GraphService Frontend: - Extend client GraphEdgeData with discoverySource, isAutoSuggested, associationId - CustomEdge: dashed style + "suggested" badge for auto-suggested edges - EdgeDetailsPanel: discovery source badge, confirm/dismiss buttons - NodeDetailsPanel: enriched description/impact/contact for external nodes Tests: 15 new test cases across backend and frontend
Add configurable span retention (default 7 days) via app_settings table, dismissed auto-suggestion cleanup using the same retention window, and admin GET/PUT endpoints at /api/admin/settings/span-retention.
…sing, and span storage Document new tables (spans, app_settings, external_node_enrichment), extended columns on dependencies/associations/latency_history, trace ingestion flow (TraceParser, TraceDependencyBridge, AutoAssociator), histogram/sum metric processing, discovery source graph styling, and external node enrichment.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add full OpenTelemetry (OTLP) push ingestion pipeline ΓÇö metrics and traces ΓÇö alongside Prometheus scraping support. This transforms depsera from a pull-only health poller into a hybrid pull/push observability platform with automatic dependency discovery from distributed traces.
Linear tickets: DPS-77 through DPS-116
Changes
OTLP Push Ingestion (DPS-77 -- DPS-82)
health_endpoint_formatcolumn and OTel foundation typesTeamApiKeyStorewith full API key CRUD routes and audit loggingrequireApiKeyAuthmiddleware for team-scoped API key authenticationtext/plain; version=0.0.4)POST /v1/metrics) with auto-registration of unknown services and per-service custom metric/attribute name mappingsAPI Key Rate Limiting & Usage Tracking (DPS-84 -- DPS-102)
ApiKeyUsageStorefor time-bucketed usage persistenceperKeyRateLimitandtrackApiKeyUsagemiddleware with retention pruningApiKeyUsageChart,OtlpStats,ApiKeys,OtlpAdminTrace-Based Dependency Discovery (DPS-110 -- DPS-116)
DiscoverySource,Span,ExternalNodeEnrichmentSpanStore,ExternalNodeEnrichmentStore,AppSettingsStore, plus extensions toDependencyStore,AssociationStore, andLatencyHistoryStoreTraceParserservice -- extracts dependencies from CLIENT and PRODUCER spans in OTLP trace payloadsTraceDependencyBridge-- converts trace-discovered dependencies into dependency recordsotlpServiceResolver-- shared module for service lookup/auto-creationPOST /v1/tracesendpoint with full integration testsAutoAssociator-- auto-associates trace-discovered dependencies to registered servicesDocumentation & Misc
Testing
npm test)npm run lint)Test coverage added:
TraceParser,TraceDependencyBridge,OtlpParser,PrometheusParser,AutoAssociator,perKeyRateLimit,trackApiKeyUsage,ApiKeyUsageStore,TeamApiKeyStore,SpanStore,ExternalNodeEnrichmentStore,AppSettingsStore,histogramPercentiles, andvalidationutilitiesPOST /v1/traces,POST /v1/metrics, rate limit/usage routes, admin OTLP stats, span retention, discovered dependency management, external node enrichment, and association confirm/dismiss flowsApiKeyUsageChart,OtlpStats,ApiKeys,OtlpAdminChecklist