feat(mcp): first-class metric source support by dhable · Pull Request #2437 · hyperdxio/hyperdx

dhable · 2026-06-10T18:13:36Z

Summary

Why: Metric sources were MVP-only in MCP — describe_source early-returned with just metricTables, builder tiles on metric sources saved but failed at render time, the query tools couldn't carry metricName / metricType, and the prompt told agents to fall back to raw SQL. Closes that gap.

What changed:

New tools
- clickstack_list_metrics — paginated metric-name catalog with kind, namePattern (ILIKE), time-window filters, and opaque cursor pagination.
- clickstack_describe_metric — per-metric unit, description, attribute keys, sampled values. kind is required (paired with metricName by every upstream discovery tool, so requiring it eliminates the multi-kind probe path).
Extended tools
- clickstack_describe_source now picks a representative metric table (gauge → sum → histogram) and runs the existing column / map-key / value-sampling pipeline against it, plus a per-kind metric-name sample.
- clickstack_timeseries / clickstack_table accept metricType, metricName, isDelta on each select item, plus aggFn:"increase" for Sum counters. valueExpression defaults to "Value". Emits a hint when the renderer's 20-group cap on increase + groupBy may apply. Reject up-front when metricType is paired with a non-metric source (or omitted on a metric source) instead of falling through to a cryptic ClickHouse error.
Prompt — replaces the "use raw SQL for metric tiles" workaround with positive discovery-workflow guidance and one worked example per supported kind.
Out of scope: summary and "exponential histogram" kinds (no renderer support).

Live MCP testing findings & fixes

Live testing against a populated metric source surfaced 5 issues. All are fixed in this PR.

#	Symptom	Root cause	Fix
1	`Cast to ObjectId failed for value "..." (type string) at path "_id" for model "Source"` leaked through `clickstack_describe_source` / `_list_metrics` / `_describe_metric` on a bad `sourceId`	`getSource()` didn't guard against malformed input	Pre-check `ObjectId.isValid` + try/catch around `findOne`; all tools surface their existing "Source not found" branch (`19bc4205`)
2	`clickstack_timeseries` / `clickstack_table` accepted `metricType`+`metricName` on non-metric sources and failed at SQL render with `Unknown expression or function identifier "Value"`	No source-kind ↔ select-item shape guardrail in `runConfigTile`	New `assertSourceKindMatchesSelect()` helper called between source-not-found and the connection lookup — both directions of the mismatch return agent-actionable errors (`7001a915`)
3	`metricTables` map in `clickstack_list_sources` / `_describe_source` responses included a stray `_id` ObjectId alongside the valid kind keys	Inline `type: { … }` Mongoose subdoc auto-adds `_id`	Declare a child `Schema(..., { _id: false })` for `metricTables`; defense-in-depth `sanitizeMetricTables()` at the MCP response boundary (`7ef9d9ca`)
4	`clickstack_describe_metric` consistently timed out at 10 s on multi-kind metrics (e.g. `container.cpu.usage` lives in both gauge and sum)	Auto-detect path probed every populated kind serially, plus `fetchAttributeKeys` ran an unbounded `WHERE MetricName = ?` scan with no time filter	Require `kind` (every upstream tool already pairs it with name); bound `fetchAttributeKeys` with a CTE sample LIMIT + server-side `max_execution_time` (`30203f5e`)
5	`clickstack_describe_source` on a populated metric source timed out on the cold call, then succeeded on retry	Stages 2–5 fell through to the no-rollup main-table-scan path without `timestampValueExpression`, so the helpers' time filter was silently dropped	Thread `source.timestampValueExpression` through to `getMapKeys`, both `getAllKeyValues` calls, and `sampleMetricNamesForKind` (`31bc21b5`)

Screenshots or video

N/A — backend-only.

How to test on Vercel preview

N/A — non-UI change.

References

Linear Issue: Closes HDX-4347
Parent: HDX-4079

changeset-bot · 2026-06-10T18:13:40Z

🦋 Changeset detected

Latest commit: 3b0b570

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages

Name	Type
@hyperdx/api	Minor
@hyperdx/app	Minor
@hyperdx/otel-collector	Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

vercel · 2026-06-10T18:13:42Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
hyperdx-oss	Ready	Preview, Comment	Jun 12, 2026 8:30pm
hyperdx-storybook	Ready	Preview, Comment	Jun 12, 2026 8:30pm

greptile-apps · 2026-06-10T18:20:13Z

Greptile Summary

This PR adds first-class metric source support to the HyperDX MCP server, replacing the previous "use raw SQL" workaround. It introduces clickstack_list_metrics and clickstack_describe_metric tools, extends clickstack_describe_source to discover metric schemas, and wires metricType/metricName/isDelta/aggFn:"increase" into the existing timeseries and table query tools.

Five production issues fixed: ObjectId validation in getSource, source-kind/select-item mismatch guardrail, metricTables._id leak via Mongoose subdoc, fetchAttributeKeys unbounded scan, and describeSource cold-start timeout — all addressed with targeted fixes.
QUERYABLE_METRIC_KINDS and sanitizeMetricTables consolidated into a shared metricKinds.ts module imported by every metric-aware tool; previously duplicated in three files.
clickstack_list_metrics (new) is the only tool in this PR that lacks the AbortController + Promise.race timeout guard applied to describeMetric and describeSource; its ClickHouse queries also carry no max_execution_time setting.

Confidence Score: 4/5

Safe to merge with the understanding that clickstack_list_metrics may hang under ClickHouse load; all other production issues flagged in the PR description are correctly addressed.

The new clickstack_list_metrics tool omits the timeout/abort circuit-breaker that both sibling tools in this PR (describeMetric, describeSource) were explicitly given after production timeout incidents. The GROUP BY MetricName query in fetchMetricsForKind has no abort_signal, no clickhouse_settings max_execution_time, and the outer handler has no Promise.race wrapper — the same conditions that caused the cold-start and multi-kind timeouts described in the PR. All other changes are well-guarded and the broader refactoring (metricKinds.ts consolidation, ObjectId guard, Mongoose _id: false, assertSourceKindMatchesSelect) is solid.

packages/api/src/mcp/tools/sources/listMetrics.ts — the fetchMetricsForKind function and the tool handler both need timeout/abort plumbing consistent with describeMetric.ts and describeSource.ts.

Important Files Changed

Filename	Overview
packages/api/src/mcp/tools/sources/listMetrics.ts	New paginated metric-catalog tool. Cursor encoding/pagination logic is correct, but unlike the other metric tools introduced here it has no wall-clock timeout, no abort signal on ClickHouse queries, and no per-query max_execution_time setting.
packages/api/src/mcp/tools/sources/describeMetric.ts	New per-metric drill-down tool. AbortController timeout is hoisted into a finally block (clearing the timer on success), sampleAttributeValues now mirrors fetchAttributeKeys's CTE LIMIT + max_execution_time pattern, and kind is required rather than auto-detected serially.
packages/api/src/mcp/tools/sources/describeSource.ts	Extended with metric-source support: picks a representative metric table, runs the existing column/map/value pipeline against it, and samples MetricName per kind. timestampValueExpression is correctly threaded into all metadata helpers.
packages/api/src/mcp/tools/sources/metricKinds.ts	New shared module consolidating QUERYABLE_METRIC_KINDS and sanitizeMetricTables that were previously duplicated across three files; compile-time assertion keeps the string literals in sync with the MetricsDataType enum.
packages/api/src/mcp/tools/query/helpers.ts	Adds assertSourceKindMatchesSelect guardrail, metric select defaulting, and increase+groupBy top-N hint. Uses ?? for valueExpression defaulting. Logic is sound.
packages/api/src/mcp/tools/query/schemas.ts	Adds metricType/metricName/isDelta/increase to select item schema; getMetricSelectIssues enforces cross-field rules (sum-only increase, histogram quantile+level, etc.); applyMetricSelectDefaults defaults valueExpression to "Value" for metric items.
packages/api/src/controllers/sources.ts	getSource now validates ObjectId before calling findOne and wraps in try/catch to convert Mongoose CastError into null, preventing internal error leakage through MCP tools.
packages/api/src/models/source.ts	MetricTablesSchema declared with _id: false so Mongoose subdoc no longer auto-generates an ObjectId that leaks into MCP responses.

Sequence Diagram

sequenceDiagram
    participant Agent
    participant MCP as MCP Server
    participant CH as ClickHouse

    Agent->>MCP: clickstack_list_sources
    MCP-->>Agent: sources[] (with sanitized metricTables)

    Agent->>MCP: clickstack_describe_source(sourceId)
    MCP->>CH: getColumns(representativeMetricTable)
    MCP->>CH: getMapKeys / getAllKeyValues (timestampValueExpression scoped)
    MCP->>CH: sampleMetricNamesForKind (per populated kind)
    MCP-->>Agent: columns, mapKeys, metricNames sample

    Agent->>MCP: clickstack_list_metrics(sourceId, kind?, namePattern?)
    MCP->>CH: fetchMetricsForKind (GROUP BY MetricName, time-filtered)
    Note over MCP,CH: No AbortSignal / max_execution_time
    MCP-->>Agent: metrics[], nextCursor?

    Agent->>MCP: clickstack_describe_metric(sourceId, metricName, kind)
    MCP->>CH: fetchAttributeKeys (CTE LIMIT + max_execution_time:8s)
    MCP->>CH: sampleAttributeValues (CTE LIMIT + max_execution_time:8s)
    Note over MCP: Promise.race(impl, 10s timeout)
    MCP-->>Agent: attributeKeys, sampledValues, unit, description

    Agent->>MCP: clickstack_timeseries / clickstack_table
    MCP->>MCP: validateMetricSelectItems + applyMetricSelectDefaults
    MCP->>MCP: assertSourceKindMatchesSelect
    MCP->>CH: queryChartConfig (max_execution_time:30s)
    MCP-->>Agent: result + increase top-N hint?

_{Reviews (9): Last reviewed commit: "[HDX-4347] feat(mcp): distinguish unsamp..." | Re-trigger Greptile}

github-actions · 2026-06-10T18:35:16Z

E2E Test Results

✅ All tests passed • 202 passed • 3 skipped • 1319s

Status	Count
✅ Passed	202
❌ Failed	0
⚠️ Flaky	1
⏭️ Skipped	3

Tests ran across 4 shards in parallel.

View full report →

dhable · 2026-06-10T18:45:40Z

          );
          throw e;
+        } finally {
+          if (timeoutId !== undefined) clearTimeout(timeoutId);


Drive-by fix: this setTimeout-never-cleared pattern is pre-existing and was not flagged by greptile (the surrounding code is untouched by HDX-4347). Fixing here for consistency with the same fix in describeMetric.ts two files over — both files now clearTimeout in finally so a stale controller.abort() will no longer fire DESCRIBE_TIMEOUT_MS after every successful describe call, and the setTimeout closure is released immediately on resolution.

…emas Extend the MCP select-item schemas with first-class metric fields so the clickstack_timeseries, clickstack_table, and dashboard builder tile tools can accept metric-source queries: - Add 'increase' to MCP_AGG_FN_OPTIONS (sum-only counter-increase aggFn) - Add optional metricType (gauge/sum/histogram), metricName, and isDelta fields to mcpSelectItemSchema and mcpTileSelectItemSchema - Restrict metricType to the three kinds the renderer supports today; summary and exponential histogram are excluded - Extract cross-field validation into getMetricSelectIssues so both schemas share the same metric rules: metricType<->metricName paired, 'increase' is sum-only, histogram only allows quantile (with level) or count, isDelta is gauge-only, valueExpression is required for non-count aggFns unless metricType is set (defaults to 'Value' for metric sources) - Apply validation imperatively in the timeseries/table handlers via validateMetricSelectItems (works around a Zod 3.x quirk where .superRefine widens optional-field inference and breaks the strict consumer types of mergeWhereIntoSelectItems and resolveOrderBy) - Update timeseries/table tool descriptions with a METRIC SOURCES subsection covering the discovery workflow chain and per-kind aggFn guidance, plus the 20-group top-N cap on sum+increase+groupBy - Exclude 'increase' from AGG_FN_NAMES so resolveOrderBy doesn't try to synthesize a non-existent SQL increase() function Adds 20 unit tests covering the validator branches and the resolveOrderBy handling of 'increase'. No new TypeScript errors above the pre-existing baseline on main.

…er path Make clickstack_timeseries and clickstack_table actually work against metric sources. Before this change the builder branch of runConfigTile fell back to the source's empty tableName and never threaded source.metricTables onto the chart config, so any metric query failed with 'Both table name and UUID are empty' at render time. For metric sources, the builder branch now: - Sets from.tableName = '' (renderer picks the per-kind ClickHouse table from metricTables at render time) - Adds metricTables: source.metricTables to the chart config (mirrors packages/api/src/routers/external-api/v2/charts.ts:261-267) - Defaults each select item's valueExpression to 'Value' when missing, matching the agent-friendly external-API behavior so metric queries can omit valueExpression and have it resolve correctly Also surfaces the renderer's increase+groupBy top-N cap (20 groups) to agents: - Adds annotateIncreaseTopNHint helper that attaches a neutral hint to successful, non-empty tool results when any select item uses aggFn:'increase' alongside a non-empty groupBy - Exposes INCREASE_TOP_N_CAP=20 as a named constant referencing the renderer's INCREASE_MAX_NUM_GROUPS in common-utils - Wires the hint into both clickstack_timeseries and clickstack_table Adds 9 unit tests covering the hint helper across the increase+groupBy condition, empty results, error results, missing groupBy, missing increase aggFn, and malformed/missing content envelopes. No new TypeScript errors above the pre-existing baseline on main.

Replace the early-return for metric sources at describeSource.ts:97-115 with a metric-aware discovery flow so agents can use the existing describe-then-query workflow against metric sources without falling back to clickstack_sql. For metric sources, describe_source now: - Picks a representative metric table (gauge -> sum -> histogram) via pickRepresentativeMetricTable, exposed in meta.discoveryMetricKind so agents know which kind's schema they're seeing - Reuses the existing stages 1-4 (columns, map keys, low-cardinality values, map-attribute values) against the representative table — the OTel Collector metric schemas share Attributes/ResourceAttributes/ ScopeAttributes shape, so one sample is representative enough for the starter view - Samples up to 20 distinct MetricName values per queryable kind in a new stage 5 (sampleMetricNamesForKind), reusing the discovery dateRange and respecting the existing wall-clock timeout / partial-result flag - Defensively enriches each sampled metric with MetricUnit / MetricDescription when those columns exist on the kind's table (getColumns presence check); falls back gracefully on schemas that omit them - Surfaces the new clickstack_list_metrics / clickstack_describe_metric discovery tools via nextSteps.discovery so agents can paginate or drill into per-metric attribute keys when the sample isn't enough Updates the existing locking test in sources.test.ts that asserted columns/lowCardinalityValues/etc. were undefined for metric sources; the test now asserts the rich discovery payload (columns present, discoveryMetricKind set to 'gauge', nextSteps mention the new discovery tools). No new TypeScript errors above the pre-existing baseline on main.

…ribe_metric Add two new metric-specific discovery tools so agents can find and characterise metrics on a metric source without falling back to clickstack_sql or guessing OTel semantic-convention names. clickstack_list_metrics - Paginated metric-name catalog scoped to a metric source - Optional kind filter (gauge/sum/histogram); omit to scan every populated kind in order - Optional namePattern (ClickHouse ILIKE) and time window - Defaults to last 24h to surface only metrics with recent data - Opaque base64 cursor pagination over (kind, lastName) pairs so multi-kind scans resume cleanly mid-result - Default limit 50, max 500 - Defensive MetricUnit/MetricDescription enrichment using getColumns presence checks (matches the OTel Collector default schema; falls back gracefully on custom schemas) - Empty-result hint guides the agent to widen the filter or drop kind clickstack_describe_metric - Per-metric drill-down: kind(s), unit, description, attributeKeys per map column, sampled values (when sampleValues=true, default) - kind auto-detection across all populated tables when omitted, via parallel SELECT 1 probes per kind - Reuses metadata.getAllFields with metricName filter for attribute-key discovery scoped to the specific metric (existing helper at packages/common-utils/src/core/metadata.ts:736-816) - Per-kind usage guidance baked into the response (aggFn hints, level requirement for histogram quantile, 20-group cap for sum + increase) - Pre-built clickstack_timeseries call example in nextSteps.query with the right aggFn / level for the detected kind - 10s wall-clock timeout matches clickstack_describe_source Both tools reject non-Metric sources with a friendly hint pointing at clickstack_list_sources. Tool descriptions document the full discovery workflow chain: list_sources -> describe_source -> list_metrics -> describe_metric -> timeseries|table. Notable implementation detail: the queryable-kind tuple is declared as plain string literals ('gauge'|'sum'|'histogram') rather than MetricsDataType enum members because Zod 3.x's z.enum(...) infers an unknown-widened type when fed an enum-member tuple, which breaks the MCP SDK callback inference downstream. A compile-time assertion keeps the string literals in sync with the enum. No new TypeScript errors above the pre-existing baseline on main.

…idance in prompt The clickstack_save_dashboard prompt previously told the model to fall back to raw SQL tiles for any metric source because the MCP select-item shape did not carry metricType / metricName and builder tiles failed at render time with 'Both table name and UUID are empty'. Stages 1-3b fixed both gaps. Update the prompt and its regression tests to teach the new discovery + builder workflow. prompts/dashboards/content.ts - Replace the 'NOTE: Authoring builder tiles on a metric source is not reliable today...' paragraph with a dedicated '== METRIC SOURCES ==' section that documents: * the required select-item fields (metricType, metricName) and the valueExpression='Value' default * per-kind aggregation guidance for gauge / sum / histogram including the histogram quantile + level requirement and the 20-group cap on sum + increase + groupBy * the five-step discovery workflow (list_sources -> describe_source -> list_metrics -> describe_metric -> timeseries|table) * one worked example per supported kind using real OTel metric names - Drop 'metric tiles with multiple metricTables' and 'builder tiles on metric sources' from the validation-known-gaps paragraphs at lines 127 and 1247 since both gaps are closed. __tests__/dashboards/prompts.test.ts - Rewrite the two regression tests that asserted the old workaround language. The new tests assert the positive guidance is present (metricType / metricName / isDelta fields are named, per-kind aggFn rules and 20-group cap are documented, the five-step workflow chain is enumerated, one worked example per kind appears) AND assert the old workaround language is gone (no 'Authoring builder tiles on a metric source is not reliable', no 'Both table name and UUID are empty').

…s, cursor Adds unit and integration coverage for the metric pieces shipped in Stages 1-3b. listMetricsCursor.test.ts (NEW, unit) - 12 tests for encodeCursor / decodeCursor round-trips and rejection cases. Exercises malformed base64, malformed JSON, missing fields, wrong types, non-queryable kinds (summary, exponential histogram), empty input, and unicode metric names. Exports cursor helpers from listMetrics.ts as @internal so they can be tested without spinning up ClickHouse. sources.test.ts (integration; expanded) - clickstack_list_metrics: * rejects non-metric sources and unknown source IDs * returns metrics across every populated kind when kind is omitted * narrows to a single kind when kind is set * applies namePattern as a server-side ILIKE filter * paginates via opaque nextCursor and resumes cleanly * rejects malformed cursors with an actionable error * surfaces the empty-result hint - clickstack_describe_metric: * rejects non-metric sources * returns an actionable hint when the metric is unknown * auto-detects kind for a gauge metric, returns attribute keys + sampled values + matching usage example * respects sampleValues:false * returns the increase-shaped query example for sum kind * returns the quantile + level example for histogram kind * rejects an explicit kind when the source has no table for it queryTool.test.ts (integration; expanded) - clickstack_timeseries and clickstack_table: * gauge avg over time * sum + aggFn:'increase' table * histogram + aggFn:'quantile' + level:0.95 timeseries - Schema rejection paths: * aggFn:'increase' on a gauge -> friendly schema error * aggFn:'quantile' on a histogram without level * aggFn:'avg' on a histogram (not supported) * metricType set without metricName * isDelta on a non-gauge metric All unit tests pass (103 total: 63 query + 12 cursor + 28 prompts). Integration tests require make dev-int with the standard Docker stack to run end-to-end (they rely on bulkInsertMetricsGauge/Sum/Histogram seeding). No new TypeScript errors above the pre-existing baseline on main.

Update the user-facing MCP tool reference to reflect the new metric discovery + querying capabilities shipped in Stages 1-3b. - Add clickstack_list_metrics and clickstack_describe_metric rows to the tool table - Mark clickstack_describe_source / clickstack_timeseries / clickstack_table as supporting metric sources alongside log/trace - Add a new 'Metric Sources' section covering: * the required metricType + metricName fields on each select item * the valueExpression='Value' default * per-kind aggregation guidance (gauge/sum/histogram, the increase top-N cap on sum + groupBy, the level requirement for histogram quantile) * the unsupported metric kinds (summary, exponential histogram) * the five-step discovery workflow chaining list_sources -> describe_source -> list_metrics -> describe_metric -> timeseries|table Closes the documentation acceptance criterion on HDX-4347.

- Extract QUERYABLE_METRIC_KINDS to a shared packages/api/src/mcp/tools/sources/metricKinds.ts so describeMetric, listMetrics, describeSource, and the query / dashboard tile schemas all reference a single source of truth. Adding a new queryable kind (e.g. when the renderer learns 'summary') is now a one-file change (greptile). - Drop the unused 'ms' import in listMetrics.ts that would have failed CI lint (greptile). - Clear the describe_metric wall-clock timeout in 'finally' so a stale controller.abort() does not fire DESCRIBE_TIMEOUT_MS after every successful call, and the setTimeout closure does not stay pinned for the same duration (greptile). - Apply the same clearTimeout fix to describe_source as a drive-by: the bug pre-existed this PR but the pattern is now consistent across both tools (PR comment posted on the line). - Use ?? instead of || for the metric-source valueExpression default in runConfigTile so an explicit empty string is not silently swapped for 'Value' (greptile). - Remove the unused MCP_METRIC_TYPE_OPTIONS export; mcpMetricTypeSchema now builds directly off the shared QUERYABLE_METRIC_KINDS tuple (knip). The dashboard tile mcpTileMetricTypeSchema also switches to the shared constant for consistency. Verified: yarn tsc --noEmit baseline diff = 0, yarn lint clean, yarn knip clean, 105 unit tests pass.

Bug 1: timeseries / table metric queries failed with 'Value expression is required for non-count aggregation functions' The Stage 2 default of valueExpression='Value' for metric items ran inside runConfigTile, AFTER buildTile had already called externalDashboardTileSchemaWithId.parse(). That schema's superRefine rejects non-count aggregations with an empty valueExpression, so any metric query that omitted the field never reached the runtime default. Add applyMetricSelectDefaults() to tools/query/schemas.ts and call it in the timeseries / table handlers BEFORE buildTile, mirroring the agent-friendly default the external API v2 charts path applies on the REST surface. The runtime default in runConfigTile stays as defensive coverage for saved dashboards. Bug 2: clickstack_describe_metric returned an empty attributeKeys map for metric scopes that DID have data metadata.getAllFields({ metricName }) was returning empty arrays for metric-scoped Map-column scans in the CI environment. Replace the call with fetchAttributeKeys(), a direct query that issues 'arrayDistinct(arrayFlatten(groupArray(mapKeys(col))))' for each Map column on the kind's table. This matches the proven pattern used by the web app's useFetchMetricResourceAttrs hook and is more transparent than the metadata helper for this single-purpose case. Verified end-to-end against make dev-int: - queryTool.test.ts: 53/53 pass (was 50/53 before this commit) - sources.test.ts: 76/76 pass (was 75/76 before this commit) - query.test.ts unit suite: 69/69 (added 4 new tests for applyMetricSelectDefaults) No new TypeScript errors above the pre-existing baseline.

The new applyMetricSelectDefaults tests inferred narrow object shapes from inline literals, so 'valueExpression' was absent from the inferred generic type and TS errored on accessing 'out[0].valueExpression'. Annotate each test input as 'SelectItem' so the optional field stays in the inferred type. Tests behave identically at runtime; this is a TypeScript-only fix.

Calling clickstack_describe_source / clickstack_list_metrics / clickstack_describe_metric with a non-ObjectId sourceId surfaced a raw Mongoose CastError to the MCP client: Cast to ObjectId failed for value "…" (type string) at path "_id" for model "Source" The well-formed-but-missing path was already handled by each tool's "Source X not found" branch, but malformed IDs threw before getSource returned. Pre-check via mongoose.Types.ObjectId.isValid and wrap the findOne in a defensive try/catch so getSource returns null on any malformed input and the existing not-found branch fires cleanly. Add a controller unit test asserting both the malformed and missing ObjectId paths return null.

…ype mismatch clickstack_timeseries and clickstack_table accepted metricType + metricName on any source. When the source was not a metric source, the renderer composed SQL that referenced the metric "Value" column and let ClickHouse fail with: Unknown expression or function identifier `Value` in scope SELECT AVG(toFloat64OrDefault(toString(Value))) … FROM default.otel_traces The reciprocal case (metric source with no metricType) reached the renderer in the same broken state. Add assertSourceKindMatchesSelect() in tools/query/helpers.ts and call it inside runConfigTile right after getSource(). Both clickstack_timeseries and clickstack_table inherit the guard, with agent-actionable error messages that mirror the wording on clickstack_list_metrics / clickstack_describe_metric. Add unit tests covering both mismatch directions plus the no-op paths (raw-string select, empty metricType, non-array select).

clickstack_list_sources and clickstack_describe_source emitted the embedded metricTables subdoc with a stray Mongoose-generated _id key mixed in with the legitimate kind keys: "metricTables": { "gauge": "otel_metrics_gauge", "sum": "otel_metrics_sum", "_id": "6a2ad3b4da94be764e2deed8" } The metricTables field was declared as an inline `type: {...}` literal which Mongoose interprets as an embedded subdoc schema with an auto-generated _id field. Convert it to a real child Schema with { _id: false } so newly persisted documents stop carrying the stray id. Add a sanitizeMetricTables() helper as belt-and-braces defense for documents persisted before the schema fix, and use it at the MCP response boundary in both list_sources and describe_source. The helper looks up each valid MetricsDataType key via property access so it handles both plain objects and Mongoose subdoc instances. Tests: unit tests for sanitizeMetricTables covering the _id strip, non-string-value drop, unknown-key drop, and Mongoose-subdoc shape. Extend the existing describe_source integration test to assert the response payload only contains valid kind keys.

clickstack_describe_metric historically auto-detected the metric kind when omitted, probing every populated metric table for the given MetricName and returning one entry per kind that matched. This path was the consistent source of 10s wall-clock timeouts on multi-kind metrics (e.g. container.cpu.usage which lives in both gauge and sum tables), even with sampleValues:false. The auto-detect path was also redundant: the upstream discovery tools both surface kind alongside name: - clickstack_list_metrics returns { name, kind, ... } - clickstack_describe_source returns metricNames grouped by kind By the time an agent reaches clickstack_describe_metric it already has the kind. Requiring it on the input drops the detectKindsForMetric probe, eliminates the multi-kind discovery path, and removes the only place the tool ran across more than one ClickHouse table. Also bound the per-kind attribute-keys discovery so the single-kind path stays fast on production-scale metric tables. The previous inline SQL ran an unbounded `WHERE MetricName = ?` scan with no time filter, which still exceeded the wall-clock timeout on tables with millions of rows per metric. Wrap the scan in a CTE that samples at most 100k matching rows (MetricName + time window) and aggregate map keys from that sample, plus an 8s server-side max_execution_time as a belt-and-braces safety net. Changes: - Make `kind` required in describeMetricSchema (was optional). - Remove detectKindsForMetric — auto-detect path no longer exists. - Collapse the if/else branch to the explicit-kind path. - Always return exactly one kindDetail; emit a top-level hint when the (metric, kind) pair has no signal so the agent can widen the window or verify the kind via clickstack_list_metrics. - Bound fetchAttributeKeys with a CTE-LIMIT sample + max_execution_time so the discovery query stays within the wall clock on hot metrics. - Update the tool description and the timeout hint accordingly. Tests: drop the auto-detect and multi-kind tests; existing tests that already pinned `kind` are unchanged. Update the unknown-metric test to reflect the new always-one-kind shape. Add a test asserting the tool rejects calls with missing kind.

…e_source clickstack_describe_source consistently timed out on the cold-cache first call against a populated metric source, then succeeded on retry. The bottleneck was the value-sampling stages (3 + 4) and the metric- name sampling stage (5), which all delegate to metadata.getAllKeyValues / metadata.getMapKeys. When the source has no metadataMaterializedViews configured (the case for every metric source today), those helpers fall back to scanning the raw metric table. The fallback path only applies a time filter when timestampValueExpression is provided alongside dateRange — and describe_source was passing dateRange but not timestampValueExpression. The result was an unbounded scan per key against the full metric table on cold cache, easily exceeding the 10s wall-clock timeout. Pass source.timestampValueExpression through to: - metadata.getMapKeys (stage 2) - metadata.getAllKeyValues × 2 (stages 3 + 4) - sampleMetricNamesForKind (stage 5) and its inner metadata.getAllKeyValues call For log / trace sources this is also a small win because the no-MVs fallback now scopes its scan, but the practical impact is on metric sources where the rollup MVs are not yet configured. No new tests: existing sources.test.ts coverage already exercises every stage on a metric source. The change is observably load-bearing on the live MCP server — the first call now returns the full payload in under the 10s budget instead of timing out.

…AttributeKeys Greptile pointed out that sampleAttributeValues hit the same (MetricName, time range) filter on the same metric table as fetchAttributeKeys but lacked any of the scan bounds the earlier commit added there: - no inner LIMIT subquery - no max_execution_time / timeout_overflow_mode On a hot metric with millions of matching rows in the default 24-hour window the query could run until the client-side AbortController fires the wall-clock timeout, after which the ClickHouse server may still be processing the work. Apply the same pattern used in fetchAttributeKeys: - Wrap the FROM/WHERE clause in a subquery that LIMITs at METRIC_ATTR_KEYS_SAMPLE_SIZE matching rows. The inner SELECT projects only the distinct set of map columns referenced by the sampled keys. - Add clickhouse_settings: { max_execution_time: METRIC_ATTR_KEYS_MAX_EXEC_SECONDS, timeout_overflow_mode: 'break' } as a server-side ceiling. Live retest against the metric source on staging confirms the attributeValues payload still comes back rich (same k8s.* diversity on container.cpu.usage) — the 100k-row sample is plenty for value discovery. Refs: #2437 (comment)

github-actions · 2026-06-11T21:22:06Z

🔴 Tier 4 — Critical

Touches auth, data models, config, tasks, OTel pipeline, ClickHouse, or CI/CD.

Why this tier:

Large diff: 2341 production lines changed (threshold: 1000)
Touches API routes or data models — hidden complexity risk

Review process: Deep review from a domain expert. Synchronous walkthrough may be required.
SLA: Schedule synchronous review within 2 business days.

Stats

Production files changed: 14
Production lines changed: 2341 (+ 1567 in test files, excluded from tier calculation)
Branch: hdx-4347-mcp-tools-improved-metrics-support
Author: dhable

To override this classification, remove the review/tier-4 label and apply a different review/tier-* label. Manual overrides are preserved on subsequent pushes.

brandon-pereira · 2026-06-12T18:01:14Z

+    return await Source.findOne({ _id: sourceId, team });
+  } catch {
+    // Defense-in-depth: if Mongoose still throws (e.g. a future cast
+    // path), treat it as "not found" so the MCP tools surface a clean


docs(non-blocking): we should make this comment more generic since this controller is used across the app (eg so the caller can surface a clean error)

brandon-pereira · 2026-06-12T19:20:44Z

@@ -92,14 +109,28 @@
      }
      const { startDate, endDate } = timeRange;



issue(existing, non-blocking): Not related to this PR, but during testing this the top reason that LLM went from using timeseries to raw sql was the following:

timeseries.ts threads granularity into the tile config (lines 135–142), but buildTile in
helpers.ts (lines 187–202) parses the input through externalDashboardTileSchemaWithId. The
line/stacked_bar sub-schema in zod.ts (lines 246–261) has no granularity field, so Zod
silently strips it.

// packages/api/src/utils/zod.ts:246-261 // ❌ granularity is not declared — Zod strips any extra fields const lineTileSchema = z.object({ type: z.literal('line'), // ... other fields, but no granularity });

Downstream, convertToInternalTileConfig never re-injects it — contrast with the REST charts path,
which explicitly defaults it:

// packages/api/src/routes/charts.ts:289 granularity: translatedGranularity ?? 'auto',

Impact

The renderer only includes the time bucket in SELECT/GROUP BY/ORDER BY when
chartConfig.granularity != null:

// packages/api/src/renderChartConfig.ts:109-128 if (chartConfig.granularity != null) { // adds __hdx_time_bucket to SELECT, GROUP BY, ORDER BY }

Because granularity was stripped, every MCP timeseries call for gauge/sum metrics (and
log/trace sources) returns a single collapsed row per group — no timestamp column at all —
regardless of what the agent passed.

The gauge/sum translation does bucket the inner CTE with granularity || 'auto', but then
spreads ...restChartConfig for the outer query, so the outer aggregation collapses. Histogram
only escapes this by hard-coding __hdx_time_bucket directly (line 2106).

brandon-pereira · 2026-06-12T19:25:03Z

      // 1 row, the data likely collapsed into a single time bucket. Add a
      // hint so the agent knows to adjust the time range.
      if (
        result.content?.[0]?.type === 'text' &&


issue(existing, non-blocking): Another existing but more clear one now with metrics:

Bug: Hint Heuristics Fire on Wrong Signal, Recommend Wrong Fix, and Clobber Each Other

1. Single-Bucket Hint Triggers on the Wrong Signal

The "only 1 time bucket" hint in timeseries.ts:170-189 triggers on data.length === 1, which conflates three distinct cases:

Genuine single-bucket collapse (granularity too coarse)

A single group when groupBy is set (working as intended)

The no-time-column collapse from previous comment — where the advice to widen range or use coarser granularity can't help and "coarser" is actually backwards

A more robust signal would be to inspect result.meta for the presence of the __hdx_time_bucket column rather than counting rows. This distinguishes the no-time-column case (issue #1) from a genuine single-bucket result, and avoids misfiring when groupBy produces exactly one group.

brandon-pereira · 2026-06-12T19:32:29Z

+      `aggFn:"increase" combined with groupBy is capped at the top ${INCREASE_TOP_N_CAP} ` +
+      `groups by max bucket sum at the renderer layer. ` +
+      `Results may not include every group present in the underlying data.`;
+    first.text = JSON.stringify(parsed, null, 2);


suggestion: annotateIncreaseTopNHint in helpers.ts:99-121 emits a "may be truncated" disclaimer whenever increase + groupBy returns any data — even with fewer than 20 groups — making it a false alarm rather than an actual detection.

Additionally, both this function and the single-bucket hint in timeseries.ts assign to parsed.hint directly. Since the single-bucket hint runs first (timeseries.ts:165) and annotateIncreaseTopNHint runs after (timeseries.ts:178), one silently overwrites the other with no indication either existed.

Suggested fix: change hint to hints: string[] and append at each site. Gate the truncation hint on an actual group count threshold (e.g. >= 20) rather than emitting unconditionally.

brandon-pereira · 2026-06-12T19:34:12Z

+      { tableName, error: e instanceof Error ? e.message : String(e) },
+      'sampleAttributeValues failed',
+    );
+    return {};


suggestion: Failures are silently discarded in several places — describeMetric.ts:301-307, describeMetric.ts:412-418, and per-kind skips in listMetrics.ts:385-395 — causing transient ClickHouse errors to surface as the noSignal hint (describeMetric.ts:584-604):

"No data found — try widening startTime/endTime"

Agents cannot distinguish "genuinely empty" from "fetch failed" and will follow incorrect recovery steps as a result.

Suggested fix: instead of swallowing errors into empty returns, surface a partialFailure indicator in the response shape — recording which kinds failed and why. This gives agents a recoverable, accurate signal ("some metric kinds failed") rather than a misleading empty-data path.

brandon-pereira · 2026-06-12T19:36:03Z

+      connectionId,
+      clickhouse_settings: {
+        max_execution_time: METRIC_ATTR_KEYS_MAX_EXEC_SECONDS,
+        timeout_overflow_mode: 'break',


suggestion: Callers can't distinguish "not sampled" from "no values". A per-key sampled: boolean/truncated: boolean would allow agents to know to adjust their query

brandon-pereira · 2026-06-12T19:47:19Z

@@ -92,14 +109,28 @@
      }
      const { startDate, endDate } = timeRange;


idea(non-blocking): probably OOS but I waned a better historic graph

It would be helpful if the timeseries tool returned per-bucket rows with a timestamp column for metric sources using increase + groupBy, rather than collapsing to per-group aggregates with no ts column — that way the bucket breakdown is actually visible in the output.

brandon-pereira · 2026-06-12T19:50:43Z

Overall - metrics support works very good! It fell back to sql a few times, everytime it did I asked my agent to explain why (findings above). I tested it on my local instance and a larger internal instance with tons of metrics, it worked 🥳

getSource is called from REST routers, external-api v2, and MCP tools alike — the inline comments referenced only the MCP surface. Reword to caller-neutral language per review feedback. Refs: #2437 (comment)

… to hints[] Two problems with the query-tool hint plumbing: 1. annotateIncreaseTopNHint emitted the 'results may be truncated' disclaimer whenever increase + groupBy returned any data — even 3 groups — making it a false alarm rather than a detection. 2. Both annotateIncreaseTopNHint and the single-bucket hint in timeseries.ts assigned to parsed.hint directly, so whichever ran second silently overwrote the other. Fixes: - Add countDistinctGroups(): resolves comma-separated groupBy segments against the result row keys (bracket-aware split) and counts distinct combinations. The truncation hint now only fires when the distinct group count reaches INCREASE_TOP_N_CAP. When no segment resolves to a row key (e.g. bracket-syntax map attrs render as arrayElement(...) aliases), skip the hint instead of crying wolf. - Replace the single hint: string with hints: string[] across the query-tool response shape (annotateIncreaseTopNHint, the single-bucket hint, and formatQueryResult's empty-result hint) via a shared appendHint() helper, so multiple advisories coexist. Tests: rewrite the annotateIncreaseTopNHint block — at/below cap, distinct-groups-vs-raw-rows, multi-column groupBy combinations, unresolvable groupBy, and append-not-overwrite coexistence. Refs: #2437 (comment)

…ist_metrics Discovery sub-query failures were silently swallowed into empty returns (fetchAttributeKeys / sampleAttributeValues catch → {}; listMetrics per-kind catch → continue). Transient ClickHouse errors then surfaced as the 'No data found — try widening startTime/endTime' hint, so agents could not distinguish 'genuinely empty' from 'fetch failed' and followed incorrect recovery steps. - fetchAttributeKeys / sampleAttributeValues now return a typed { ok, data } | { ok: false, error } result. Errors are sanitised (single line, capped at 200 chars). - describe_metric collects failures into partialFailure[{stage,error}] on the response and suppresses the no-data hint when any stage failed — empty attributeKeys can't be trusted as 'no data' then. A retry-oriented hint is emitted instead. - list_metrics collects per-kind fetch failures into partialFailure[{kind,error}] and suppresses the 'No metrics matched' hint when failures explain the empty result. Tests: new cases pointing a kind's metricTables entry at the logs table (exists, but no MetricName/TimeUnix columns) to force the discovery query to throw; assert partialFailure is present and the misleading widen-the-window hints are absent. Happy-path test asserts partialFailure stays undefined. Refs: #2437 (comment)

sampleAttributeValues caps value sampling at MAX_ATTR_KEYS_TO_SAMPLE (12) keys and only emits keys with non-empty samples — so a key absent from attributeValues was indistinguishable between 'skipped by the cap' and 'sampled but empty in the window'. Agents could not tell when to re-query a key directly. Add attributeValuesMeta to each kindDetail: - sampledKeys: display names actually queried for values - truncatedKeys: display names skipped because the cap fired A key in truncatedKeys was never queried; a key in sampledKeys but missing from attributeValues is genuinely empty in the window. Tool description documents the semantics so agents know to query truncated keys directly. Tests: 15-key metric → 12 sampled + 3 truncated, disjoint sets covering all keys; small metric → truncatedKeys empty. Refs: #2437 (comment)

greptile-apps Bot reviewed Jun 10, 2026

View reviewed changes

Comment thread packages/api/src/mcp/tools/sources/listMetrics.ts Outdated

Comment thread packages/api/src/mcp/tools/sources/describeMetric.ts

Comment thread packages/api/src/mcp/tools/sources/describeMetric.ts Outdated

Comment thread packages/api/src/mcp/tools/query/helpers.ts Outdated

dhable force-pushed the hdx-4347-mcp-tools-improved-metrics-support branch from 91f17a6 to 6f63d7f Compare June 10, 2026 18:26

dhable commented Jun 10, 2026

View reviewed changes

vercel Bot deployed to Preview – hyperdx-storybook June 10, 2026 18:47 View deployment

vercel Bot deployed to Preview – hyperdx-oss June 10, 2026 18:50 View deployment

vercel Bot deployed to Preview – hyperdx-storybook June 10, 2026 19:31 View deployment

vercel Bot deployed to Preview – hyperdx-oss June 10, 2026 19:32 View deployment

vercel Bot deployed to Preview – hyperdx-storybook June 10, 2026 19:41 View deployment

vercel Bot deployed to Preview – hyperdx-oss June 10, 2026 19:42 View deployment

vercel Bot deployed to Preview – hyperdx-storybook June 11, 2026 20:28 View deployment

vercel Bot deployed to Preview – hyperdx-oss June 11, 2026 20:30 View deployment

greptile-apps Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread packages/api/src/mcp/tools/sources/describeMetric.ts

dhable added 14 commits June 11, 2026 15:53

[HDX-4347] chore: add changeset for metric MCP support

666a44c

dhable added 2 commits June 11, 2026 15:53

dhable force-pushed the hdx-4347-mcp-tools-improved-metrics-support branch from 31bc21b to e27d683 Compare June 11, 2026 20:56

dhable changed the title ~~[HDX-4347] feat(mcp): first-class metric source support~~ feat(mcp): first-class metric source support Jun 11, 2026

vercel Bot deployed to Preview – hyperdx-storybook June 11, 2026 21:01 View deployment

vercel Bot deployed to Preview – hyperdx-oss June 11, 2026 21:03 View deployment

vercel Bot deployed to Preview – hyperdx-storybook June 11, 2026 21:13 View deployment

vercel Bot deployed to Preview – hyperdx-oss June 11, 2026 21:14 View deployment

dhable marked this pull request as ready for review June 11, 2026 21:21

dhable requested a review from brandon-pereira June 11, 2026 21:21

github-actions Bot added the review/tier-4 Critical — deep review + domain expert sign-off label Jun 11, 2026

brandon-pereira reviewed Jun 12, 2026

View reviewed changes

dhable added 4 commits June 12, 2026 15:15

[HDX-4347] docs(controllers): generalize getSource comments

0c2e0d9

getSource is called from REST routers, external-api v2, and MCP tools alike — the inline comments referenced only the MCP surface. Reword to caller-neutral language per review feedback. Refs: #2437 (comment)

vercel Bot deployed to Preview – hyperdx-storybook June 12, 2026 20:28 View deployment

vercel Bot deployed to Preview – hyperdx-oss June 12, 2026 20:30 View deployment

		@@ -92,14 +109,28 @@
		}
		const { startDate, endDate } = timeRange;

Conversation

dhable commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Live MCP testing findings & fixes

Screenshots or video

How to test on Vercel preview

References

Uh oh!

changeset-bot Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

vercel Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Test Results

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔴 Tier 4 — Critical

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Impact

Uh oh!

Choose a reason for hiding this comment

Bug: Hint Heuristics Fire on Wrong Signal, Recommend Wrong Fix, and Clobber Each Other

1. Single-Bucket Hint Triggers on the Wrong Signal

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brandon-pereira commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dhable commented Jun 10, 2026 •

edited

Loading

changeset-bot Bot commented Jun 10, 2026 •

edited

Loading

vercel Bot commented Jun 10, 2026 •

edited

Loading

greptile-apps Bot commented Jun 10, 2026 •

edited

Loading

github-actions Bot commented Jun 10, 2026 •

edited

Loading

github-actions Bot commented Jun 11, 2026 •

edited

Loading