feat: per-component transform latency on pipeline metrics page by TerrifiedBug · Pull Request #95 · TerrifiedBug/vectorflow

TerrifiedBug · 2026-03-11T13:37:35Z

Summary

Add nullable componentId column to PipelineMetric to store per-component latency rows alongside aggregate rows
Add componentId: null filter to all 13 existing aggregate queries across 5 files to prevent per-component rows from inflating metrics
Write per-component latency rows in the heartbeat handler via createMany (separate from the delta-tracking ingestMetrics pipeline, since latency is a gauge)
Add getComponentLatencyHistory tRPC procedure for historical per-component latency data
Replace single-line aggregate latency chart on pipeline metrics page with multi-line chart (one line per transform component, deterministic color palette)
Rename "Component Latency" → "Transform Latency" everywhere (Vector only emits component_latency_mean_seconds for transforms, not sources/sinks)

Test plan

Run a pipeline with multiple transform components and verify per-component latency rows appear in PipelineMetric table
Verify pipeline metrics page (/pipelines/[id]/metrics) shows multi-line transform latency chart with one line per component
Verify main dashboard still shows single aggregate "Transform Latency" chart (not per-component)
Verify fleet detail table still shows aggregate latency column
Verify no "Component Latency" or "Pipeline Latency" labels remain in the UI
Confirm SLI evaluator uses aggregate-only data for latency alerts

greptile-apps · 2026-03-11T13:40:41Z

Greptile Summary

This PR adds per-component transform latency tracking to VectorFlow's pipeline metrics page. It extends PipelineMetric with a nullable componentId column, writes per-component gauge rows from the heartbeat handler, introduces a getComponentLatencyHistory tRPC procedure (correctly gated with withTeamAccess("VIEWER")), and replaces the single aggregate latency chart with a multi-line Recharts LineChart — one line per transform, with a deterministic colour palette. The 13 existing aggregate queries across all 5 files correctly gain componentId: null filters to prevent the new per-component rows from inflating any dashboard, SLI, or fleet metric.

Key points:

getComponentLatencyHistory correctly applies withTeamAccess("VIEWER") and averages multi-node rows server-side before returning data to the client, so multi-node deployments produce one line per component rather than one line per node.
The heartbeat upsert loop uses sequential findFirst + update/create (intentionally awaited to avoid the TOCTOU fire-and-forget race flagged in prior review). However, the migration does not add a unique constraint on (pipelineId, nodeId, componentId, timestamp), meaning concurrent heartbeats within the same minute window can still race and create duplicate rows — the database-level guard is missing.
The tooltip formatter uses Number(value) ?? 0, which is a no-op because Number() never returns null/undefined; NaN falls through and can produce "NaN ms" when a series has no value at the hovered timestamp. Using || 0 instead is the correct fix.

Confidence Score: 3/5

Safe to merge functionally, but the missing unique constraint on the per-component metric tuple will accumulate duplicate rows over time and should be addressed.
The core feature is correctly implemented — auth middleware is present, multi-node averaging is correct, and all 13 existing aggregate queries are properly filtered. The confidence reduction comes from the absent database-level unique constraint, which means the soft deduplication in the heartbeat handler can be bypassed by concurrent requests, leading to indefinite table bloat.
prisma/migrations/20260311030000_add_component_id_to_pipeline_metric/migration.sql — a unique constraint on (pipelineId, nodeId, componentId, timestamp) should be added to enforce the deduplication invariant at the storage layer.

Important Files Changed

Filename	Overview
src/server/routers/metrics.ts	Adds `getComponentLatencyHistory` tRPC procedure with correct `withTeamAccess("VIEWER")` middleware. Server-side averaging across nodes per (componentId, timestamp) is correct. `componentId: null` filter added to `getPipelineMetrics`.
src/app/api/agent/heartbeat/route.ts	Per-component latency rows are written with a sequential find-first + conditional create/update upsert loop. Correctly uses a shared minute-truncated `minuteTimestamp`. No unique constraint in the migration means a TOCTOU race between concurrent heartbeats can still create duplicate rows, but sequential awaiting reduces the window compared to the original fire-and-forget approach.
prisma/migrations/20260311030000_add_component_id_to_pipeline_metric/migration.sql	Adds nullable `componentId` column and a compound index on (pipelineId, componentId, timestamp). No unique constraint on (pipelineId, nodeId, componentId, timestamp) is added, leaving the TOCTOU duplicate-row race unguarded at the database level.
src/app/(dashboard)/pipelines/[id]/metrics/page.tsx	Replaces aggregate latency chart with `TransformLatencyChart`. `useMemo` correctly re-derives chart data from server response. Default `minutes` initialised to `60` (1h), making "1h" the highlighted tab on load — consistent and intentional. The `Number(value) ?? 0` fallback in the tooltip formatter is a no-op because `Number()` never returns null/undefined, but this only affects visual edge cases.
src/server/routers/dashboard.ts	All 8 aggregate queries correctly gain `componentId: null` to exclude the new per-component rows from dashboard stats, latency history, and fleet metrics.

Sequence Diagram

sequenceDiagram
    participant Agent as Vector Agent
    participant HB as /api/agent/heartbeat
    participant Ingest as ingestMetrics()
    participant DB as PipelineMetric (DB)
    participant tRPC as getComponentLatencyHistory
    participant UI as Pipeline Metrics Page

    Agent->>HB: POST heartbeat {pipelines, componentMetrics}
    HB->>Ingest: fire-and-forget (counter deltas → nodeId rows + nodeId:null aggregate)
    Note over HB: minuteTimestamp = now with seconds zeroed
    loop per pipeline × per component
        HB->>DB: findFirst(pipelineId, nodeId, componentId, timestamp)
        alt row exists
            HB->>DB: update latencyMeanMs
        else
            HB->>DB: create row {nodeId≠null, componentId≠null}
        end
    end
    HB-->>Agent: 200 OK

    UI->>tRPC: getComponentLatencyHistory(pipelineId, minutes)
    tRPC->>DB: findMany(pipelineId, componentId≠null, timestamp≥since)
    DB-->>tRPC: rows [{componentId, timestamp, latencyMeanMs}]
    Note over tRPC: Average rows by (componentId, timestamp)<br/>to collapse multi-node deployments
    tRPC-->>UI: {components: Record<id, [{timestamp, latencyMeanMs}]>}
    UI->>UI: Render multi-line LineChart (one line per transform)

_{Last reviewed commit: 3691f2b}

src/server/routers/metrics.ts

src/app/api/agent/heartbeat/route.ts

- Add withTeamAccess("VIEWER") to getComponentLatencyHistory procedure - Replace createMany with findFirst+upsert to deduplicate per-component latency rows within the same minute bucket

src/server/routers/metrics.ts

Average per-component latency across nodes in getComponentLatencyHistory to handle multi-node pipeline deployments correctly

src/app/api/agent/heartbeat/route.ts

Await per-component latency upserts sequentially to eliminate TOCTOU race between concurrent heartbeat requests

src/app/api/agent/heartbeat/route.ts

prisma/migrations/20260311030000_add_component_id_to_pipeline_metric/migration.sql

TerrifiedBug added 9 commits March 11, 2026 13:36

feat: add componentId column to PipelineMetric for per-component latency

d67745c

fix: add componentId null filter to metrics-ingest queries

25bb6f4

fix: add componentId null filter to SLI evaluator queries

509a099

fix: add componentId null filter to all dashboard metric queries

c85ad9f

fix: add componentId null filter to metrics and pipeline router queries

d49ca58

feat: write per-component latency rows to PipelineMetric

6cb28a6

feat: add getComponentLatencyHistory tRPC procedure

1e03431

feat: multi-line transform latency chart on pipeline metrics page

cab6120

fix: rename Component Latency to Transform Latency in dashboard labels

7c29b09

github-actions bot added the feature label Mar 11, 2026