feat(mcp): first-class metric source support#2437
Conversation
🦋 Changeset detectedLatest commit: 3b0b570 The changes in this PR will be included in the next version bump. This PR includes changesets to release 3 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Greptile SummaryThis PR adds first-class metric source support to the HyperDX MCP server, replacing the previous "use raw SQL" workaround. It introduces
Confidence Score: 4/5Safe to merge with the understanding that clickstack_list_metrics may hang under ClickHouse load; all other production issues flagged in the PR description are correctly addressed. The new clickstack_list_metrics tool omits the timeout/abort circuit-breaker that both sibling tools in this PR (describeMetric, describeSource) were explicitly given after production timeout incidents. The GROUP BY MetricName query in fetchMetricsForKind has no abort_signal, no clickhouse_settings max_execution_time, and the outer handler has no Promise.race wrapper — the same conditions that caused the cold-start and multi-kind timeouts described in the PR. All other changes are well-guarded and the broader refactoring (metricKinds.ts consolidation, ObjectId guard, Mongoose _id: false, assertSourceKindMatchesSelect) is solid. packages/api/src/mcp/tools/sources/listMetrics.ts — the fetchMetricsForKind function and the tool handler both need timeout/abort plumbing consistent with describeMetric.ts and describeSource.ts. Important Files Changed
Sequence DiagramsequenceDiagram
participant Agent
participant MCP as MCP Server
participant CH as ClickHouse
Agent->>MCP: clickstack_list_sources
MCP-->>Agent: sources[] (with sanitized metricTables)
Agent->>MCP: clickstack_describe_source(sourceId)
MCP->>CH: getColumns(representativeMetricTable)
MCP->>CH: getMapKeys / getAllKeyValues (timestampValueExpression scoped)
MCP->>CH: sampleMetricNamesForKind (per populated kind)
MCP-->>Agent: columns, mapKeys, metricNames sample
Agent->>MCP: clickstack_list_metrics(sourceId, kind?, namePattern?)
MCP->>CH: fetchMetricsForKind (GROUP BY MetricName, time-filtered)
Note over MCP,CH: No AbortSignal / max_execution_time
MCP-->>Agent: metrics[], nextCursor?
Agent->>MCP: clickstack_describe_metric(sourceId, metricName, kind)
MCP->>CH: fetchAttributeKeys (CTE LIMIT + max_execution_time:8s)
MCP->>CH: sampleAttributeValues (CTE LIMIT + max_execution_time:8s)
Note over MCP: Promise.race(impl, 10s timeout)
MCP-->>Agent: attributeKeys, sampledValues, unit, description
Agent->>MCP: clickstack_timeseries / clickstack_table
MCP->>MCP: validateMetricSelectItems + applyMetricSelectDefaults
MCP->>MCP: assertSourceKindMatchesSelect
MCP->>CH: queryChartConfig (max_execution_time:30s)
MCP-->>Agent: result + increase top-N hint?
Reviews (9): Last reviewed commit: "[HDX-4347] feat(mcp): distinguish unsamp..." | Re-trigger Greptile |
91f17a6 to
6f63d7f
Compare
E2E Test Results✅ All tests passed • 202 passed • 3 skipped • 1319s
Tests ran across 4 shards in parallel. |
| ); | ||
| throw e; | ||
| } finally { | ||
| if (timeoutId !== undefined) clearTimeout(timeoutId); |
There was a problem hiding this comment.
Drive-by fix: this setTimeout-never-cleared pattern is pre-existing and was not flagged by greptile (the surrounding code is untouched by HDX-4347). Fixing here for consistency with the same fix in describeMetric.ts two files over — both files now clearTimeout in finally so a stale controller.abort() will no longer fire DESCRIBE_TIMEOUT_MS after every successful describe call, and the setTimeout closure is released immediately on resolution.
…emas Extend the MCP select-item schemas with first-class metric fields so the clickstack_timeseries, clickstack_table, and dashboard builder tile tools can accept metric-source queries: - Add 'increase' to MCP_AGG_FN_OPTIONS (sum-only counter-increase aggFn) - Add optional metricType (gauge/sum/histogram), metricName, and isDelta fields to mcpSelectItemSchema and mcpTileSelectItemSchema - Restrict metricType to the three kinds the renderer supports today; summary and exponential histogram are excluded - Extract cross-field validation into getMetricSelectIssues so both schemas share the same metric rules: metricType<->metricName paired, 'increase' is sum-only, histogram only allows quantile (with level) or count, isDelta is gauge-only, valueExpression is required for non-count aggFns unless metricType is set (defaults to 'Value' for metric sources) - Apply validation imperatively in the timeseries/table handlers via validateMetricSelectItems (works around a Zod 3.x quirk where .superRefine widens optional-field inference and breaks the strict consumer types of mergeWhereIntoSelectItems and resolveOrderBy) - Update timeseries/table tool descriptions with a METRIC SOURCES subsection covering the discovery workflow chain and per-kind aggFn guidance, plus the 20-group top-N cap on sum+increase+groupBy - Exclude 'increase' from AGG_FN_NAMES so resolveOrderBy doesn't try to synthesize a non-existent SQL increase() function Adds 20 unit tests covering the validator branches and the resolveOrderBy handling of 'increase'. No new TypeScript errors above the pre-existing baseline on main.
…er path Make clickstack_timeseries and clickstack_table actually work against metric sources. Before this change the builder branch of runConfigTile fell back to the source's empty tableName and never threaded source.metricTables onto the chart config, so any metric query failed with 'Both table name and UUID are empty' at render time. For metric sources, the builder branch now: - Sets from.tableName = '' (renderer picks the per-kind ClickHouse table from metricTables at render time) - Adds metricTables: source.metricTables to the chart config (mirrors packages/api/src/routers/external-api/v2/charts.ts:261-267) - Defaults each select item's valueExpression to 'Value' when missing, matching the agent-friendly external-API behavior so metric queries can omit valueExpression and have it resolve correctly Also surfaces the renderer's increase+groupBy top-N cap (20 groups) to agents: - Adds annotateIncreaseTopNHint helper that attaches a neutral hint to successful, non-empty tool results when any select item uses aggFn:'increase' alongside a non-empty groupBy - Exposes INCREASE_TOP_N_CAP=20 as a named constant referencing the renderer's INCREASE_MAX_NUM_GROUPS in common-utils - Wires the hint into both clickstack_timeseries and clickstack_table Adds 9 unit tests covering the hint helper across the increase+groupBy condition, empty results, error results, missing groupBy, missing increase aggFn, and malformed/missing content envelopes. No new TypeScript errors above the pre-existing baseline on main.
Replace the early-return for metric sources at describeSource.ts:97-115 with a metric-aware discovery flow so agents can use the existing describe-then-query workflow against metric sources without falling back to clickstack_sql. For metric sources, describe_source now: - Picks a representative metric table (gauge -> sum -> histogram) via pickRepresentativeMetricTable, exposed in meta.discoveryMetricKind so agents know which kind's schema they're seeing - Reuses the existing stages 1-4 (columns, map keys, low-cardinality values, map-attribute values) against the representative table — the OTel Collector metric schemas share Attributes/ResourceAttributes/ ScopeAttributes shape, so one sample is representative enough for the starter view - Samples up to 20 distinct MetricName values per queryable kind in a new stage 5 (sampleMetricNamesForKind), reusing the discovery dateRange and respecting the existing wall-clock timeout / partial-result flag - Defensively enriches each sampled metric with MetricUnit / MetricDescription when those columns exist on the kind's table (getColumns presence check); falls back gracefully on schemas that omit them - Surfaces the new clickstack_list_metrics / clickstack_describe_metric discovery tools via nextSteps.discovery so agents can paginate or drill into per-metric attribute keys when the sample isn't enough Updates the existing locking test in sources.test.ts that asserted columns/lowCardinalityValues/etc. were undefined for metric sources; the test now asserts the rich discovery payload (columns present, discoveryMetricKind set to 'gauge', nextSteps mention the new discovery tools). No new TypeScript errors above the pre-existing baseline on main.
…ribe_metric
Add two new metric-specific discovery tools so agents can find and
characterise metrics on a metric source without falling back to
clickstack_sql or guessing OTel semantic-convention names.
clickstack_list_metrics
- Paginated metric-name catalog scoped to a metric source
- Optional kind filter (gauge/sum/histogram); omit to scan every
populated kind in order
- Optional namePattern (ClickHouse ILIKE) and time window
- Defaults to last 24h to surface only metrics with recent data
- Opaque base64 cursor pagination over (kind, lastName) pairs so
multi-kind scans resume cleanly mid-result
- Default limit 50, max 500
- Defensive MetricUnit/MetricDescription enrichment using getColumns
presence checks (matches the OTel Collector default schema; falls
back gracefully on custom schemas)
- Empty-result hint guides the agent to widen the filter or drop kind
clickstack_describe_metric
- Per-metric drill-down: kind(s), unit, description, attributeKeys per
map column, sampled values (when sampleValues=true, default)
- kind auto-detection across all populated tables when omitted, via
parallel SELECT 1 probes per kind
- Reuses metadata.getAllFields with metricName filter for attribute-key
discovery scoped to the specific metric (existing helper at
packages/common-utils/src/core/metadata.ts:736-816)
- Per-kind usage guidance baked into the response (aggFn hints, level
requirement for histogram quantile, 20-group cap for sum + increase)
- Pre-built clickstack_timeseries call example in nextSteps.query with
the right aggFn / level for the detected kind
- 10s wall-clock timeout matches clickstack_describe_source
Both tools reject non-Metric sources with a friendly hint pointing at
clickstack_list_sources. Tool descriptions document the full discovery
workflow chain: list_sources -> describe_source -> list_metrics ->
describe_metric -> timeseries|table.
Notable implementation detail: the queryable-kind tuple is declared as
plain string literals ('gauge'|'sum'|'histogram') rather than
MetricsDataType enum members because Zod 3.x's z.enum(...) infers an
unknown-widened type when fed an enum-member tuple, which breaks the MCP
SDK callback inference downstream. A compile-time assertion keeps the
string literals in sync with the enum.
No new TypeScript errors above the pre-existing baseline on main.
…idance in prompt
The clickstack_save_dashboard prompt previously told the model to fall
back to raw SQL tiles for any metric source because the MCP select-item
shape did not carry metricType / metricName and builder tiles failed at
render time with 'Both table name and UUID are empty'. Stages 1-3b
fixed both gaps. Update the prompt and its regression tests to teach the
new discovery + builder workflow.
prompts/dashboards/content.ts
- Replace the 'NOTE: Authoring builder tiles on a metric source is not
reliable today...' paragraph with a dedicated '== METRIC SOURCES =='
section that documents:
* the required select-item fields (metricType, metricName) and the
valueExpression='Value' default
* per-kind aggregation guidance for gauge / sum / histogram including
the histogram quantile + level requirement and the 20-group cap on
sum + increase + groupBy
* the five-step discovery workflow (list_sources -> describe_source
-> list_metrics -> describe_metric -> timeseries|table)
* one worked example per supported kind using real OTel metric names
- Drop 'metric tiles with multiple metricTables' and 'builder tiles on
metric sources' from the validation-known-gaps paragraphs at lines
127 and 1247 since both gaps are closed.
__tests__/dashboards/prompts.test.ts
- Rewrite the two regression tests that asserted the old workaround
language. The new tests assert the positive guidance is present
(metricType / metricName / isDelta fields are named, per-kind aggFn
rules and 20-group cap are documented, the five-step workflow chain
is enumerated, one worked example per kind appears) AND assert the
old workaround language is gone (no 'Authoring builder tiles on a
metric source is not reliable', no 'Both table name and UUID are
empty').
…s, cursor Adds unit and integration coverage for the metric pieces shipped in Stages 1-3b. listMetricsCursor.test.ts (NEW, unit) - 12 tests for encodeCursor / decodeCursor round-trips and rejection cases. Exercises malformed base64, malformed JSON, missing fields, wrong types, non-queryable kinds (summary, exponential histogram), empty input, and unicode metric names. Exports cursor helpers from listMetrics.ts as @internal so they can be tested without spinning up ClickHouse. sources.test.ts (integration; expanded) - clickstack_list_metrics: * rejects non-metric sources and unknown source IDs * returns metrics across every populated kind when kind is omitted * narrows to a single kind when kind is set * applies namePattern as a server-side ILIKE filter * paginates via opaque nextCursor and resumes cleanly * rejects malformed cursors with an actionable error * surfaces the empty-result hint - clickstack_describe_metric: * rejects non-metric sources * returns an actionable hint when the metric is unknown * auto-detects kind for a gauge metric, returns attribute keys + sampled values + matching usage example * respects sampleValues:false * returns the increase-shaped query example for sum kind * returns the quantile + level example for histogram kind * rejects an explicit kind when the source has no table for it queryTool.test.ts (integration; expanded) - clickstack_timeseries and clickstack_table: * gauge avg over time * sum + aggFn:'increase' table * histogram + aggFn:'quantile' + level:0.95 timeseries - Schema rejection paths: * aggFn:'increase' on a gauge -> friendly schema error * aggFn:'quantile' on a histogram without level * aggFn:'avg' on a histogram (not supported) * metricType set without metricName * isDelta on a non-gauge metric All unit tests pass (103 total: 63 query + 12 cursor + 28 prompts). Integration tests require make dev-int with the standard Docker stack to run end-to-end (they rely on bulkInsertMetricsGauge/Sum/Histogram seeding). No new TypeScript errors above the pre-existing baseline on main.
Update the user-facing MCP tool reference to reflect the new metric
discovery + querying capabilities shipped in Stages 1-3b.
- Add clickstack_list_metrics and clickstack_describe_metric rows to
the tool table
- Mark clickstack_describe_source / clickstack_timeseries /
clickstack_table as supporting metric sources alongside log/trace
- Add a new 'Metric Sources' section covering:
* the required metricType + metricName fields on each select item
* the valueExpression='Value' default
* per-kind aggregation guidance (gauge/sum/histogram, the increase
top-N cap on sum + groupBy, the level requirement for histogram
quantile)
* the unsupported metric kinds (summary, exponential histogram)
* the five-step discovery workflow chaining list_sources ->
describe_source -> list_metrics -> describe_metric ->
timeseries|table
Closes the documentation acceptance criterion on HDX-4347.
- Extract QUERYABLE_METRIC_KINDS to a shared packages/api/src/mcp/tools/sources/metricKinds.ts so describeMetric, listMetrics, describeSource, and the query / dashboard tile schemas all reference a single source of truth. Adding a new queryable kind (e.g. when the renderer learns 'summary') is now a one-file change (greptile). - Drop the unused 'ms' import in listMetrics.ts that would have failed CI lint (greptile). - Clear the describe_metric wall-clock timeout in 'finally' so a stale controller.abort() does not fire DESCRIBE_TIMEOUT_MS after every successful call, and the setTimeout closure does not stay pinned for the same duration (greptile). - Apply the same clearTimeout fix to describe_source as a drive-by: the bug pre-existed this PR but the pattern is now consistent across both tools (PR comment posted on the line). - Use ?? instead of || for the metric-source valueExpression default in runConfigTile so an explicit empty string is not silently swapped for 'Value' (greptile). - Remove the unused MCP_METRIC_TYPE_OPTIONS export; mcpMetricTypeSchema now builds directly off the shared QUERYABLE_METRIC_KINDS tuple (knip). The dashboard tile mcpTileMetricTypeSchema also switches to the shared constant for consistency. Verified: yarn tsc --noEmit baseline diff = 0, yarn lint clean, yarn knip clean, 105 unit tests pass.
Bug 1: timeseries / table metric queries failed with 'Value expression
is required for non-count aggregation functions'
The Stage 2 default of valueExpression='Value' for metric items ran
inside runConfigTile, AFTER buildTile had already called
externalDashboardTileSchemaWithId.parse(). That schema's superRefine
rejects non-count aggregations with an empty valueExpression, so any
metric query that omitted the field never reached the runtime default.
Add applyMetricSelectDefaults() to tools/query/schemas.ts and call it
in the timeseries / table handlers BEFORE buildTile, mirroring the
agent-friendly default the external API v2 charts path applies on the
REST surface. The runtime default in runConfigTile stays as defensive
coverage for saved dashboards.
Bug 2: clickstack_describe_metric returned an empty attributeKeys map
for metric scopes that DID have data
metadata.getAllFields({ metricName }) was returning empty arrays for
metric-scoped Map-column scans in the CI environment. Replace the call
with fetchAttributeKeys(), a direct query that issues
'arrayDistinct(arrayFlatten(groupArray(mapKeys(col))))' for each Map
column on the kind's table. This matches the proven pattern used by
the web app's useFetchMetricResourceAttrs hook and is more transparent
than the metadata helper for this single-purpose case.
Verified end-to-end against make dev-int:
- queryTool.test.ts: 53/53 pass (was 50/53 before this commit)
- sources.test.ts: 76/76 pass (was 75/76 before this commit)
- query.test.ts unit suite: 69/69 (added 4 new tests for
applyMetricSelectDefaults)
No new TypeScript errors above the pre-existing baseline.
The new applyMetricSelectDefaults tests inferred narrow object shapes from inline literals, so 'valueExpression' was absent from the inferred generic type and TS errored on accessing 'out[0].valueExpression'. Annotate each test input as 'SelectItem' so the optional field stays in the inferred type. Tests behave identically at runtime; this is a TypeScript-only fix.
Calling clickstack_describe_source / clickstack_list_metrics / clickstack_describe_metric with a non-ObjectId sourceId surfaced a raw Mongoose CastError to the MCP client: Cast to ObjectId failed for value "…" (type string) at path "_id" for model "Source" The well-formed-but-missing path was already handled by each tool's "Source X not found" branch, but malformed IDs threw before getSource returned. Pre-check via mongoose.Types.ObjectId.isValid and wrap the findOne in a defensive try/catch so getSource returns null on any malformed input and the existing not-found branch fires cleanly. Add a controller unit test asserting both the malformed and missing ObjectId paths return null.
…ype mismatch clickstack_timeseries and clickstack_table accepted metricType + metricName on any source. When the source was not a metric source, the renderer composed SQL that referenced the metric "Value" column and let ClickHouse fail with: Unknown expression or function identifier `Value` in scope SELECT AVG(toFloat64OrDefault(toString(Value))) … FROM default.otel_traces The reciprocal case (metric source with no metricType) reached the renderer in the same broken state. Add assertSourceKindMatchesSelect() in tools/query/helpers.ts and call it inside runConfigTile right after getSource(). Both clickstack_timeseries and clickstack_table inherit the guard, with agent-actionable error messages that mirror the wording on clickstack_list_metrics / clickstack_describe_metric. Add unit tests covering both mismatch directions plus the no-op paths (raw-string select, empty metricType, non-array select).
clickstack_list_sources and clickstack_describe_source emitted the
embedded metricTables subdoc with a stray Mongoose-generated _id key
mixed in with the legitimate kind keys:
"metricTables": {
"gauge": "otel_metrics_gauge",
"sum": "otel_metrics_sum",
"_id": "6a2ad3b4da94be764e2deed8"
}
The metricTables field was declared as an inline `type: {...}` literal
which Mongoose interprets as an embedded subdoc schema with an
auto-generated _id field. Convert it to a real child Schema with
{ _id: false } so newly persisted documents stop carrying the stray id.
Add a sanitizeMetricTables() helper as belt-and-braces defense for
documents persisted before the schema fix, and use it at the MCP
response boundary in both list_sources and describe_source. The helper
looks up each valid MetricsDataType key via property access so it
handles both plain objects and Mongoose subdoc instances.
Tests: unit tests for sanitizeMetricTables covering the _id strip,
non-string-value drop, unknown-key drop, and Mongoose-subdoc shape.
Extend the existing describe_source integration test to assert the
response payload only contains valid kind keys.
clickstack_describe_metric historically auto-detected the metric kind
when omitted, probing every populated metric table for the given
MetricName and returning one entry per kind that matched. This path
was the consistent source of 10s wall-clock timeouts on multi-kind
metrics (e.g. container.cpu.usage which lives in both gauge and sum
tables), even with sampleValues:false.
The auto-detect path was also redundant: the upstream discovery tools
both surface kind alongside name:
- clickstack_list_metrics returns { name, kind, ... }
- clickstack_describe_source returns metricNames grouped by kind
By the time an agent reaches clickstack_describe_metric it already
has the kind. Requiring it on the input drops the detectKindsForMetric
probe, eliminates the multi-kind discovery path, and removes the only
place the tool ran across more than one ClickHouse table.
Also bound the per-kind attribute-keys discovery so the single-kind
path stays fast on production-scale metric tables. The previous
inline SQL ran an unbounded `WHERE MetricName = ?` scan with no
time filter, which still exceeded the wall-clock timeout on tables
with millions of rows per metric. Wrap the scan in a CTE that
samples at most 100k matching rows (MetricName + time window) and
aggregate map keys from that sample, plus an 8s server-side
max_execution_time as a belt-and-braces safety net.
Changes:
- Make `kind` required in describeMetricSchema (was optional).
- Remove detectKindsForMetric — auto-detect path no longer exists.
- Collapse the if/else branch to the explicit-kind path.
- Always return exactly one kindDetail; emit a top-level hint when
the (metric, kind) pair has no signal so the agent can widen the
window or verify the kind via clickstack_list_metrics.
- Bound fetchAttributeKeys with a CTE-LIMIT sample + max_execution_time
so the discovery query stays within the wall clock on hot metrics.
- Update the tool description and the timeout hint accordingly.
Tests: drop the auto-detect and multi-kind tests; existing tests that
already pinned `kind` are unchanged. Update the unknown-metric test
to reflect the new always-one-kind shape. Add a test asserting the
tool rejects calls with missing kind.
…e_source
clickstack_describe_source consistently timed out on the cold-cache
first call against a populated metric source, then succeeded on retry.
The bottleneck was the value-sampling stages (3 + 4) and the metric-
name sampling stage (5), which all delegate to metadata.getAllKeyValues
/ metadata.getMapKeys.
When the source has no metadataMaterializedViews configured (the case
for every metric source today), those helpers fall back to scanning
the raw metric table. The fallback path only applies a time filter
when timestampValueExpression is provided alongside dateRange — and
describe_source was passing dateRange but not timestampValueExpression.
The result was an unbounded scan per key against the full metric
table on cold cache, easily exceeding the 10s wall-clock timeout.
Pass source.timestampValueExpression through to:
- metadata.getMapKeys (stage 2)
- metadata.getAllKeyValues × 2 (stages 3 + 4)
- sampleMetricNamesForKind (stage 5)
and its inner metadata.getAllKeyValues call
For log / trace sources this is also a small win because the no-MVs
fallback now scopes its scan, but the practical impact is on metric
sources where the rollup MVs are not yet configured.
No new tests: existing sources.test.ts coverage already exercises
every stage on a metric source. The change is observably load-bearing
on the live MCP server — the first call now returns the full payload
in under the 10s budget instead of timing out.
31bc21b to
e27d683
Compare
…AttributeKeys
Greptile pointed out that sampleAttributeValues hit the same
(MetricName, time range) filter on the same metric table as
fetchAttributeKeys but lacked any of the scan bounds the earlier
commit added there:
- no inner LIMIT subquery
- no max_execution_time / timeout_overflow_mode
On a hot metric with millions of matching rows in the default 24-hour
window the query could run until the client-side AbortController
fires the wall-clock timeout, after which the ClickHouse server may
still be processing the work.
Apply the same pattern used in fetchAttributeKeys:
- Wrap the FROM/WHERE clause in a subquery that LIMITs at
METRIC_ATTR_KEYS_SAMPLE_SIZE matching rows. The inner SELECT
projects only the distinct set of map columns referenced by the
sampled keys.
- Add clickhouse_settings: { max_execution_time:
METRIC_ATTR_KEYS_MAX_EXEC_SECONDS, timeout_overflow_mode: 'break' }
as a server-side ceiling.
Live retest against the metric source on staging confirms the
attributeValues payload still comes back rich (same k8s.* diversity
on container.cpu.usage) — the 100k-row sample is plenty for value
discovery.
Refs: #2437 (comment)
🔴 Tier 4 — CriticalTouches auth, data models, config, tasks, OTel pipeline, ClickHouse, or CI/CD. Why this tier:
Review process: Deep review from a domain expert. Synchronous walkthrough may be required. Stats
|
| return await Source.findOne({ _id: sourceId, team }); | ||
| } catch { | ||
| // Defense-in-depth: if Mongoose still throws (e.g. a future cast | ||
| // path), treat it as "not found" so the MCP tools surface a clean |
There was a problem hiding this comment.
docs(non-blocking): we should make this comment more generic since this controller is used across the app (eg so the caller can surface a clean error)
| @@ -92,14 +109,28 @@ | |||
| } | |||
| const { startDate, endDate } = timeRange; | |||
|
|
|||
There was a problem hiding this comment.
issue(existing, non-blocking): Not related to this PR, but during testing this the top reason that LLM went from using timeseries to raw sql was the following:
timeseries.ts threads granularity into the tile config (lines 135–142), but buildTile in
helpers.ts (lines 187–202) parses the input through externalDashboardTileSchemaWithId. The
line/stacked_bar sub-schema in zod.ts (lines 246–261) has no granularity field, so Zod
silently strips it.
// packages/api/src/utils/zod.ts:246-261
// ❌ granularity is not declared — Zod strips any extra fields
const lineTileSchema = z.object({
type: z.literal('line'),
// ... other fields, but no granularity
});Downstream, convertToInternalTileConfig never re-injects it — contrast with the REST charts path,
which explicitly defaults it:
// packages/api/src/routes/charts.ts:289
granularity: translatedGranularity ?? 'auto',Impact
The renderer only includes the time bucket in SELECT/GROUP BY/ORDER BY when
chartConfig.granularity != null:
// packages/api/src/renderChartConfig.ts:109-128
if (chartConfig.granularity != null) {
// adds __hdx_time_bucket to SELECT, GROUP BY, ORDER BY
}Because granularity was stripped, every MCP timeseries call for gauge/sum metrics (and
log/trace sources) returns a single collapsed row per group — no timestamp column at all —
regardless of what the agent passed.
The gauge/sum translation does bucket the inner CTE with granularity || 'auto', but then
spreads ...restChartConfig for the outer query, so the outer aggregation collapses. Histogram
only escapes this by hard-coding __hdx_time_bucket directly (line 2106).
| // 1 row, the data likely collapsed into a single time bucket. Add a | ||
| // hint so the agent knows to adjust the time range. | ||
| if ( | ||
| result.content?.[0]?.type === 'text' && |
There was a problem hiding this comment.
issue(existing, non-blocking): Another existing but more clear one now with metrics:
Bug: Hint Heuristics Fire on Wrong Signal, Recommend Wrong Fix, and Clobber Each Other
1. Single-Bucket Hint Triggers on the Wrong Signal
The "only 1 time bucket" hint in timeseries.ts:170-189 triggers on data.length === 1, which conflates three distinct cases:
- Genuine single-bucket collapse (granularity too coarse)
- A single group when
groupByis set (working as intended) - The no-time-column collapse from previous comment — where the advice to widen range or use coarser granularity can't help and "coarser" is actually backwards
A more robust signal would be to inspect result.meta for the presence of the __hdx_time_bucket column rather than counting rows. This distinguishes the no-time-column case (issue #1) from a genuine single-bucket result, and avoids misfiring when groupBy produces exactly one group.
| `aggFn:"increase" combined with groupBy is capped at the top ${INCREASE_TOP_N_CAP} ` + | ||
| `groups by max bucket sum at the renderer layer. ` + | ||
| `Results may not include every group present in the underlying data.`; | ||
| first.text = JSON.stringify(parsed, null, 2); |
There was a problem hiding this comment.
suggestion: annotateIncreaseTopNHint in helpers.ts:99-121 emits a "may be truncated" disclaimer whenever increase + groupBy returns any data — even with fewer than 20 groups — making it a false alarm rather than an actual detection.
Additionally, both this function and the single-bucket hint in timeseries.ts assign to parsed.hint directly. Since the single-bucket hint runs first (timeseries.ts:165) and annotateIncreaseTopNHint runs after (timeseries.ts:178), one silently overwrites the other with no indication either existed.
Suggested fix: change hint to hints: string[] and append at each site. Gate the truncation hint on an actual group count threshold (e.g. >= 20) rather than emitting unconditionally.
| { tableName, error: e instanceof Error ? e.message : String(e) }, | ||
| 'sampleAttributeValues failed', | ||
| ); | ||
| return {}; |
There was a problem hiding this comment.
suggestion: Failures are silently discarded in several places — describeMetric.ts:301-307, describeMetric.ts:412-418, and per-kind skips in listMetrics.ts:385-395 — causing transient ClickHouse errors to surface as the noSignal hint (describeMetric.ts:584-604):
"No data found — try widening startTime/endTime"
Agents cannot distinguish "genuinely empty" from "fetch failed" and will follow incorrect recovery steps as a result.
Suggested fix: instead of swallowing errors into empty returns, surface a partialFailure indicator in the response shape — recording which kinds failed and why. This gives agents a recoverable, accurate signal ("some metric kinds failed") rather than a misleading empty-data path.
| connectionId, | ||
| clickhouse_settings: { | ||
| max_execution_time: METRIC_ATTR_KEYS_MAX_EXEC_SECONDS, | ||
| timeout_overflow_mode: 'break', |
There was a problem hiding this comment.
suggestion: Callers can't distinguish "not sampled" from "no values". A per-key sampled: boolean/truncated: boolean would allow agents to know to adjust their query
| @@ -92,14 +109,28 @@ | |||
| } | |||
| const { startDate, endDate } = timeRange; | |||
There was a problem hiding this comment.
idea(non-blocking): probably OOS but I waned a better historic graph
It would be helpful if the timeseries tool returned per-bucket rows with a timestamp column for metric sources using increase + groupBy, rather than collapsing to per-group aggregates with no ts column — that way the bucket breakdown is actually visible in the output.
|
Overall - metrics support works very good! It fell back to sql a few times, everytime it did I asked my agent to explain why (findings above). I tested it on my local instance and a larger internal instance with tons of metrics, it worked 🥳 |
getSource is called from REST routers, external-api v2, and MCP tools alike — the inline comments referenced only the MCP surface. Reword to caller-neutral language per review feedback. Refs: #2437 (comment)
… to hints[] Two problems with the query-tool hint plumbing: 1. annotateIncreaseTopNHint emitted the 'results may be truncated' disclaimer whenever increase + groupBy returned any data — even 3 groups — making it a false alarm rather than a detection. 2. Both annotateIncreaseTopNHint and the single-bucket hint in timeseries.ts assigned to parsed.hint directly, so whichever ran second silently overwrote the other. Fixes: - Add countDistinctGroups(): resolves comma-separated groupBy segments against the result row keys (bracket-aware split) and counts distinct combinations. The truncation hint now only fires when the distinct group count reaches INCREASE_TOP_N_CAP. When no segment resolves to a row key (e.g. bracket-syntax map attrs render as arrayElement(...) aliases), skip the hint instead of crying wolf. - Replace the single hint: string with hints: string[] across the query-tool response shape (annotateIncreaseTopNHint, the single-bucket hint, and formatQueryResult's empty-result hint) via a shared appendHint() helper, so multiple advisories coexist. Tests: rewrite the annotateIncreaseTopNHint block — at/below cap, distinct-groups-vs-raw-rows, multi-column groupBy combinations, unresolvable groupBy, and append-not-overwrite coexistence. Refs: #2437 (comment)
…ist_metrics
Discovery sub-query failures were silently swallowed into empty
returns (fetchAttributeKeys / sampleAttributeValues catch → {};
listMetrics per-kind catch → continue). Transient ClickHouse errors
then surfaced as the 'No data found — try widening startTime/endTime'
hint, so agents could not distinguish 'genuinely empty' from 'fetch
failed' and followed incorrect recovery steps.
- fetchAttributeKeys / sampleAttributeValues now return a typed
{ ok, data } | { ok: false, error } result. Errors are sanitised
(single line, capped at 200 chars).
- describe_metric collects failures into partialFailure[{stage,error}]
on the response and suppresses the no-data hint when any stage
failed — empty attributeKeys can't be trusted as 'no data' then.
A retry-oriented hint is emitted instead.
- list_metrics collects per-kind fetch failures into
partialFailure[{kind,error}] and suppresses the 'No metrics matched'
hint when failures explain the empty result.
Tests: new cases pointing a kind's metricTables entry at the logs
table (exists, but no MetricName/TimeUnix columns) to force the
discovery query to throw; assert partialFailure is present and the
misleading widen-the-window hints are absent. Happy-path test asserts
partialFailure stays undefined.
Refs: #2437 (comment)
sampleAttributeValues caps value sampling at MAX_ATTR_KEYS_TO_SAMPLE (12) keys and only emits keys with non-empty samples — so a key absent from attributeValues was indistinguishable between 'skipped by the cap' and 'sampled but empty in the window'. Agents could not tell when to re-query a key directly. Add attributeValuesMeta to each kindDetail: - sampledKeys: display names actually queried for values - truncatedKeys: display names skipped because the cap fired A key in truncatedKeys was never queried; a key in sampledKeys but missing from attributeValues is genuinely empty in the window. Tool description documents the semantics so agents know to query truncated keys directly. Tests: 15-key metric → 12 sampled + 3 truncated, disjoint sets covering all keys; small metric → truncatedKeys empty. Refs: #2437 (comment)
Summary
Why: Metric sources were MVP-only in MCP —
describe_sourceearly-returned with justmetricTables, builder tiles on metric sources saved but failed at render time, the query tools couldn't carrymetricName/metricType, and the prompt told agents to fall back to raw SQL. Closes that gap.What changed:
clickstack_list_metrics— paginated metric-name catalog withkind,namePattern(ILIKE), time-window filters, and opaque cursor pagination.clickstack_describe_metric— per-metric unit, description, attribute keys, sampled values.kindis required (paired withmetricNameby every upstream discovery tool, so requiring it eliminates the multi-kind probe path).clickstack_describe_sourcenow picks a representative metric table (gauge → sum → histogram) and runs the existing column / map-key / value-sampling pipeline against it, plus a per-kind metric-name sample.clickstack_timeseries/clickstack_tableacceptmetricType,metricName,isDeltaon each select item, plusaggFn:"increase"for Sum counters.valueExpressiondefaults to"Value". Emits a hint when the renderer's 20-group cap onincrease + groupBymay apply. Reject up-front whenmetricTypeis paired with a non-metric source (or omitted on a metric source) instead of falling through to a cryptic ClickHouse error.summaryand"exponential histogram"kinds (no renderer support).Live MCP testing findings & fixes
Live testing against a populated metric source surfaced 5 issues. All are fixed in this PR.
Cast to ObjectId failed for value "..." (type string) at path "_id" for model "Source"leaked throughclickstack_describe_source/_list_metrics/_describe_metricon a badsourceIdgetSource()didn't guard against malformed inputObjectId.isValid+ try/catch aroundfindOne; all tools surface their existing "Source not found" branch (19bc4205)clickstack_timeseries/clickstack_tableacceptedmetricType+metricNameon non-metric sources and failed at SQL render withUnknown expression or function identifier "Value"runConfigTileassertSourceKindMatchesSelect()helper called between source-not-found and the connection lookup — both directions of the mismatch return agent-actionable errors (7001a915)metricTablesmap inclickstack_list_sources/_describe_sourceresponses included a stray_idObjectId alongside the valid kind keystype: { … }Mongoose subdoc auto-adds_idSchema(..., { _id: false })formetricTables; defense-in-depthsanitizeMetricTables()at the MCP response boundary (7ef9d9ca)clickstack_describe_metricconsistently timed out at 10 s on multi-kind metrics (e.g.container.cpu.usagelives in both gauge and sum)fetchAttributeKeysran an unboundedWHERE MetricName = ?scan with no time filterkind(every upstream tool already pairs it with name); boundfetchAttributeKeyswith a CTE sample LIMIT + server-sidemax_execution_time(30203f5e)clickstack_describe_sourceon a populated metric source timed out on the cold call, then succeeded on retrytimestampValueExpression, so the helpers' time filter was silently droppedsource.timestampValueExpressionthrough togetMapKeys, bothgetAllKeyValuescalls, andsampleMetricNamesForKind(31bc21b5)Screenshots or video
N/A — backend-only.
How to test on Vercel preview
N/A — non-UI change.
References