Skip to content

Latest commit

 

History

History
271 lines (198 loc) · 16.1 KB

File metadata and controls

271 lines (198 loc) · 16.1 KB

Design: Heuristic + LLM Dual-Backend for dashcli suggest

Generated by /office-hours on 2026-03-26 Branch: worktree-dashcli-suggest Repo: agi-bootstrap/dashcli Status: APPROVED Mode: Startup

Problem Statement

dashcli's suggest command currently requires ANTHROPIC_API_KEY — an LLM call is the only path from CSV to dashboard spec. This creates a hard adoption barrier: every data scientist who runs dashcli suggest data.csv without an API key configured hits an error and bounces. For a tool positioned as "the entry point that beats Streamlit," requiring an API key is more config than Streamlit, not less.

The heuristic approach eliminates this barrier: column profiling + deterministic spec generation → a useful dashboard from any CSV in under 100ms, zero config, zero cost, fully offline. The LLM approach remains available as an opt-in flag for users who want semantically richer suggestions.

Demand Evidence

  • Direct observation: data scientists at ByteDance/TikTok are vibe-coding ad-hoc dashboards with inconsistent results
  • The "agent-first" gap between Aeolus (exabyte-scale, human-first) and one-off vibe-coded dashboards is structural
  • Streamlit adoption data: 90%+ of Fortune 50 use it for data apps — zero-config is the proven adoption pattern
  • Landscape research: Quesma's experience building Grafana dashboards with AI+CLI confirms that data profiling is firmly in the "deterministic functions just work" category

Status Quo

Current dashcli suggest: Calls Anthropic API → gets 3-5 YAML specs → writes files to disk. Requires API key. Non-deterministic. Not composable (writes files, not stdout). ~5-15s per run.

Streamlit: streamlit run app.py → instant dashboard. Zero config. But: Python-only, not agent-friendly, no declarative spec.

The gap: No tool offers point-at-CSV → deterministic-YAML-on-stdout that agents can compose.

Target User & Narrowest Wedge

Target user: Data scientists using AI coding tools who need quick dashboards for ad-hoc data without overhead.

Narrowest wedge: dashcli suggest data.csv → single YAML spec on stdout in under 100ms, zero config. The fastest path from data to managed dashboard artifact.

Constraints

  1. Heuristic suggest must work with zero environment variables — no API key, no network
  2. Both backends (heuristic and LLM) must output to stdout for composability
  3. Must not break existing dashcli suggest users — the --ai flag provides the LLM path
  4. Column profiling must handle edge cases: no dates, no measures, special characters in column names
  5. Generated specs must pass validateSpec() validation (uses DashboardSpec.safeParse() per existing codebase pattern)

Premises

  1. Heuristic suggest is the right default — eliminates the API key barrier, makes suggest a deterministic composable building block. Agents already have LLMs; the tool layer should be fast and predictable.
  2. One command, two backendsdashcli suggest data.csv (heuristic, default) and dashcli suggest data.csv --ai (LLM, opt-in). No separate commands.
  3. Column profiling via type + name regex + value pattern matching is sufficient for the default path — structural basics (KPI + bar + line + table), not semantic analysis.
  4. YAML on stdout for both backends — composability over convenience. Multi-document YAML (--- separators) for --ai mode's multiple specs.
  5. The plan's layout algorithm covers the 80% case — KPIs row 0, bar+line row 1, detail table row 2, with graceful degradation.

Approaches Considered

Approach A: Heuristic-Only Replace

Replace LLM suggest entirely with heuristic algorithm. Simplest diff. Loses LLM capability for power users. Completeness: 6/10.

Approach B: Dual-Backend — Heuristic Default + LLM Opt-in (CHOSEN)

Implement heuristic suggest as default. Refactor LLM suggest behind --ai flag. Both output to stdout. Heuristic outputs 1 spec, --ai outputs multiple specs with YAML document markers. Completeness: 9/10.

Approach C: Layered Profile + Suggest

Split into dashcli profile (JSON) and dashcli suggest (YAML). Most agent-composable but adds API surface and loses instant smart dashboards for humans. Completeness: 7/10.

Recommended Approach

Approach B: Dual-Backend — chosen because it covers both the zero-config adoption path (heuristic default) and the power-user path (LLM via --ai), with consistent stdout output for both.

Architecture

src/suggest.ts  — refactored
  ├── Column profiling (NEW)
  │   ├── profileCsv(csvPath) → ProfileResult
  │   │   classifies columns as date/measure/dimension
  │   │   computes date ranges, dimension cardinalities
  │   └── Types: ProfileResult, ColumnClass
  │
  ├── Heuristic spec generation (NEW)
  │   ├── generateSpec(profile, csvBasename) → DashboardSpec
  │   │   deterministic: dates/dims/measures → KPIs + bar + line + table
  │   └── Grid layout algorithm from plan.md
  │
  ├── LLM spec generation (EXISTING, refactored)
  │   ├── suggestAI() — renamed from suggestDashboards(), returns YAML string on stdout
  │   │   (breaking change: OK since v0.1 pre-release, no external consumers)
  │   └── buildSchemaSummary() — unchanged
  │
  └── Orchestrator
      ├── suggest(csvPath) → string (YAML on stdout)
      └── suggestAI(csvPath) → string (multi-doc YAML on stdout)

src/index.ts — modified
  └── suggest command: parse --ai flag, route to heuristic or LLM backend
      Both: process.stdout.write(result)
      --ai requires ANTHROPIC_API_KEY (error if missing)
      Default: no API key needed

Column Classification Algorithm

For each column in CSV:
  1. date   — sqlType === "TEXT" AND (
              name matches /date|time|month|year|created_at|updated_at|timestamp|_at$/i
              OR first 5 non-null values match /^\d{4}-\d{2}-\d{2}/)
  2. measure — sqlType === "INTEGER" or "REAL"
  3. dimension — everything else (TEXT, not date)

Gates and edge cases:

  • Name-based date check requires TEXT type (prevents INTEGER "year" column misclassification).
  • Value-based date detection samples first 5 non-null values; all must match to classify as date.
  • If a column's first data rows are empty/null, SQLite types it as TEXT. The profiler samples beyond the first row to avoid misclassification — check up to 10 non-null values for type inference.
  • Single-row data: If the CSV has only 1 data row, generate a table-only layout (aggregations over a single value are trivial and confusing).

Layout Algorithm

Row 0: KPIs — 1 per measure, capped at grid width (3-4 columns)
Row 1: Bar (dim[0] × measure[0]) + Line (date[0] × measure[0])
Row 2: Detail table (all dims + all measures)

Graceful degradation:
  - No dimensions → no bar chart, line spans full width
  - No dates → no line chart, bar spans full width
  - No measures → table only (raw SELECT of dimensions)

Grid columns: Math.max(3, Math.min(measures.length, 4))

Filters

  • First date column → date_range filter (min/max as defaults)
  • Each dimension with ≤15 distinct values → dropdown filter (default: "all")

The profiler stores both the distinct count and (for dimensions with ≤15 distinct values) the actual distinct values, so filter defaults can be populated. The 15-value threshold keeps dropdown menus usable; higher cardinality dimensions are excluded from filters.

Output Format

Heuristic (default): Single YAML document on stdout.

LLM (--ai): Multiple YAML documents separated by --- on stdout. Each document is a complete, valid dashboard spec.

Key Implementation Details

  • KPI queries alias result as value (viewer expects row.value)
  • SQL identifiers double-quote escaped: "col".replace(/"/g, '""')
  • WHERE clause: all filter placeholders joined with AND
  • Generated spec validated with validateSpec() (uses DashboardSpec.safeParse()) before output
  • source field: ./basename.csv (relative path)
  • YAML via stringify from existing yaml dependency (lineWidth: 0)

Exported API

// NEW — heuristic path
export function profileCsv(csvPath: string): ProfileResult
export function generateSpec(profile: ProfileResult, csvBasename: string): DashboardSpec
export function suggest(csvPath: string): string  // synchronous orchestrator → YAML string

// EXISTING — renamed from suggestDashboards(), refactored to return string instead of writing files
// Breaking change: OK since v0.1 pre-release with no external consumers
export async function suggestAI(csvPath: string, options?: SuggestOptions): Promise<string>

Note: The heuristic suggest() is synchronous (pure computation). In index.ts, call it directly (no await). The --ai path via suggestAI() remains async.

Files Changed

File Action Est. Lines
src/suggest.ts Major refactor — add heuristic path, refactor LLM to stdout ~350-400
src/index.ts Modify — parse --ai flag, route accordingly ~10
test/suggest.test.ts Major rewrite — heuristic tests + refactored LLM tests ~300

Open Questions

  1. Stdin piping for serve: dashcli serve currently takes a file path and uses existsSync(). For dashcli suggest data.csv | dashcli serve - to work, serve needs stdin support. This is a separate feature — defer to a follow-up milestone. For now, users write to a file: dashcli suggest data.csv > spec.yaml && dashcli serve spec.yaml.
  2. Multi-doc YAML piping: Can dashcli serve accept multi-document YAML from stdin, or does it need a new --pick flag to select one spec? (Deferred to implementation.)
  3. JSON data sources: The plan focuses on CSV. The heuristic profiler should also handle JSON via the existing loadDataSource() path — but the profiling logic assumes tabular data. Confirm JSON works with the same column classification.
  4. Future dashcli profile command: Approach C's profile primitive could be added later as a separate command without conflicting with this design. Worth considering post-launch.

Success Criteria

  1. dashcli suggest sample/sales.csv outputs valid YAML to stdout with zero env vars — in under 100ms
  2. dashcli suggest sample/sales.csv --ai outputs multi-doc YAML to stdout (requires API key)
  3. Generated spec passes validateSpec() validation (uses DashboardSpec.safeParse())
  4. All existing tests pass + new heuristic tests (20+ test cases per plan.md)
  5. No regression in LLM suggest quality — same prompts, same schema summary

Distribution Plan

Existing: dashcli is a bun-based CLI installed locally. No new distribution artifact needed.

Dependencies

  • None for heuristic path (pure computation on existing data layer)
  • @anthropic-ai/sdk remains a dependency for --ai flag (already installed)

The Assignment

Run dashcli suggest sample/sales.csv with the heuristic implementation and compare the output to what dashcli suggest sample/sales.csv --ai generates. Show the heuristic output to a data scientist colleague and ask: "Would you use this as a starting point, or would you rather write the spec from scratch?" Their answer tells you whether the heuristic covers the 80% case.

Accepted Expansions (from /autoplan review)

  1. ID/cardinality guard for profiler — columns where cardinality equals row count (e.g., user_id, order_id) should be excluded from measures. Also exclude columns matching /_id$/i or /_key$/i name patterns. Prevents SUM(user_id) nonsense in KPIs.

  2. dashcli profile command — expose profileCsv() as dashcli profile data.csv outputting JSON to stdout. ~20 lines in index.ts. The most agent-aligned primitive — agents can read the profile, reason about it, and generate custom specs.

  3. escId() DRY consolidation — extract the 3 copies (suggest.ts, csv.ts, query.ts) into a shared location. Import from one source.

  4. Determinism test — add success criterion: same CSV input produces identical YAML output across runs.

  5. Label humanization — generate chart labels from column names: total_revenueTotal Revenue.

  6. Time-box note — this milestone should take 1-2 days of CC implementation. If the profiler is still being tweaked after that, scope is creeping.

Updated Files Changed

File Action Est. Lines
src/suggest.ts Major refactor — heuristic + LLM refactor + ID guard ~400
src/index.ts Modify — --ai flag, profile command, routing ~30
test/suggest.test.ts Major rewrite — 38 test cases ~350
src/utils.ts (new) Extract shared escId() ~5

Decision Audit Trail

# Phase Decision Principle Rationale Rejected
1 CEO Accept Approach B (dual-backend) P6 Validated in /office-hours A, C
2 CEO Mode: SELECTIVE EXPANSION P1+P2 Feature enhancement default
3 CEO Defer dashcli profile as separate cmd P3 Separate feature
4 CEO Defer stdin serve P3 Separate feature
5 CEO Accept escId DRY fix P2+P4 In blast radius, trivial
6 CEO Accept ID/cardinality guard P1+P5 Critical gap from subagent
7 CEO Accept dashcli profile command P1+P2 ~20 lines, agent-aligned Override #3
8 CEO Defer dashcli data.csv auto-serve P3+P6 Requires stdin serve TODOS.md
9 CEO Defer template approach P3 Separate milestone TODOS.md
10 CEO Accept determinism test P5 Trivial, validates key claim
11 CEO Accept time-box note P6 Prevents scope creep
12 CEO Defer competitive analysis expansion P3 Informational TASTE
13 Design Accept label humanization P5 Prevents generic chart titles
14 Eng Write 38-test plan P1 Complete coverage
15 Design Accept chart title templates P5 Prevents generic titles
16 Design Accept error handling spec P1 Critical gap from subagent
17 Design Accept stderr success line P5 Human feedback w/o stdout
18 Design Exclude all-null columns P1 Prevents useless filters
19 Eng deriveTableName DRY fix P2+P4 Prevents silent bugs
20 Eng Consider profiler.ts split P5 Clean seam for profile cmd TASTE
21 Eng Fix csv.ts multi-row sampling P1 Root cause of type misclass
22 Eng Zero measures+dims fallback P1 Edge case with no output
23 Eng Cap table columns at ~20 P3 Prevents unusable tables

What I noticed about how you think

  • You didn't accept the binary choice on Premise 1. When I asked "heuristic or LLM?", you said "help me compare the two" and then chose "both." That's product thinking — refusing to accept a false tradeoff when you can have both with minimal extra cost.
  • You're building v0.1.3.0 of a tool nobody outside your team uses yet, but you're already thinking about adoption barriers (API key friction) and composability (stdout piping). You're building infrastructure, not a demo.
  • You chose the 9/10 completeness option over the 6/10 simple option. You're not cutting corners on a feature that's supposed to be the "entry point that beats Streamlit."

GSTACK REVIEW REPORT

Review Trigger Why Runs Status Findings
CEO Review /plan-ceo-review Scope & strategy 1 clean 5 premises confirmed, 1 expansion accepted
CEO Voice autoplan-voices Independent challenge 1 issues_open 10 findings (1 critical, 4 high), 5 accepted
Design Review /plan-design-review UI/UX gaps 1 clean Label humanization accepted, all states covered
Eng Review /plan-eng-review Architecture & tests 1 clean 38-test plan, no critical gaps

VERDICT: REVIEWED — 14 auto-decisions logged, 1 taste decision surfaced at gate. Plan ready for implementation with accepted expansions.