Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
b5f8181
feat: add local dashboard SPA with React + Vite
WellDunDun Mar 12, 2026
5ec0ec5
fix: address CodeRabbit review feedback for local dashboard
WellDunDun Mar 12, 2026
4f20227
fix: address second round CodeRabbit review feedback
WellDunDun Mar 12, 2026
4c65a75
Merge remote-tracking branch 'origin/dev' into WellDunDun/local-dashb…
WellDunDun Mar 12, 2026
efbfa5c
feat: align local dashboard SPA with SQLite v2 data architecture
WellDunDun Mar 12, 2026
df7ad87
Merge origin/dev into WellDunDun/local-dashboard-spa
WellDunDun Mar 12, 2026
a56aead
fix: address CodeRabbit review feedback on local dashboard SPA
WellDunDun Mar 12, 2026
b3560a7
fix: sort imports to satisfy Biome organizeImports lint
WellDunDun Mar 12, 2026
515db2b
fix: address CodeRabbit nitpicks β€” cross-platform dev script, stricte…
WellDunDun Mar 12, 2026
675d64d
fix: add UNKNOWN status filter and extract header height CSS variable
WellDunDun Mar 12, 2026
3638582
fix: hoist sidebar collapse state to layout and add UNKNOWN filter style
WellDunDun Mar 12, 2026
86b3bcf
Merge remote-tracking branch 'origin/dev' into WellDunDun/local-dashb…
WellDunDun Mar 12, 2026
788ca66
feat: serve SPA as default dashboard, legacy at /legacy/
WellDunDun Mar 12, 2026
86a6bfe
feat: add shadcn theming with dark/light toggle and selftune branding
WellDunDun Mar 12, 2026
775364f
fix: path traversal check and 404 for missing skills
WellDunDun Mar 12, 2026
cd6eebc
fix: biome formatting β€” semicolons, import order, line length
WellDunDun Mar 12, 2026
5c0cb5a
fix: address CodeRabbit review β€” dedupe polling, fix stale closures, …
WellDunDun Mar 12, 2026
b447874
fix: address CodeRabbit review round 2 β€” shared sorting, DnD fixes, T…
WellDunDun Mar 13, 2026
6d8d514
feat: add evidence viewer, evolution timeline, and enhanced skill report
WellDunDun Mar 14, 2026
8a97b9e
fix: address CodeRabbit review round 3 β€” DnD/sort conflict, theme lis…
WellDunDun Mar 14, 2026
339f545
feat: show selftune version in sidebar footer
WellDunDun Mar 14, 2026
83c3c13
fix: biome formatting in dashboard-server β€” line length wrapping
WellDunDun Mar 14, 2026
db59a86
fix: address CodeRabbit review round 4 β€” dedupe formatRate, STATUS_CO…
WellDunDun Mar 14, 2026
61720d8
fix: remove redundant items prop from Select to avoid duplication
WellDunDun Mar 14, 2026
f32e062
fix: add sortableKeyboardCoordinates to KeyboardSensor for proper key…
WellDunDun Mar 14, 2026
0e357bf
feat: Linear-style dashboard UX β€” collapsible sidebar, direct skill l…
WellDunDun Mar 14, 2026
0e54720
fix: address CodeRabbit review β€” accessibility, semantics, state reset
WellDunDun Mar 14, 2026
bbbb0d7
feat: add TanStack Query and optimize SQL queries for dashboard perfo…
WellDunDun Mar 14, 2026
c6918c2
chore: track root bun.lock for reproducible installs
WellDunDun Mar 14, 2026
37363f9
fix: address CodeRabbit review β€” collapsible sync, drag handle dedup,…
WellDunDun Mar 14, 2026
a2d84f7
fix: address CodeRabbit nitpicks β€” version pinning, changelog clarity…
WellDunDun Mar 14, 2026
d68a473
fix: address CodeRabbit round 3 β€” deps, async fs, type safety, determ…
WellDunDun Mar 14, 2026
95a2c1e
fix: resolve Biome lint and format errors in CI
WellDunDun Mar 14, 2026
845bb66
fix: address CodeRabbit round 4 β€” CTE subqueries, type alignment, sco…
WellDunDun Mar 14, 2026
dbdc702
fix: address CodeRabbit round 5 β€” startup guard, 404 heuristic, deter…
WellDunDun Mar 14, 2026
24f64d7
fix: address CodeRabbit round 6 β€” db guard, refresh throttle, deferre…
WellDunDun Mar 14, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
node_modules/
bun.lock
pnpm-lock.yaml
*.tsbuildinfo
.context/
.claude/worktrees/
Expand All @@ -16,6 +16,7 @@ Thumbs.db
coverage/
tests/sandbox/results/
.test-data/
.playwright-cli/

# Internal business strategy (kept locally, not in public repo)
docs/strategy/
Expand Down
17 changes: 13 additions & 4 deletions ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
| Auto-Activation | `cli/selftune/hooks/auto-activate.ts`, `cli/selftune/activation-rules.ts` | UserPromptSubmit hook with configurable trigger rules | B |
| Memory & Context | `cli/selftune/memory/writer.ts` | 3-file evolution memory persistence (~/.selftune/memory/) | B |
| Enforcement Guardrails | `cli/selftune/hooks/evolution-guard.ts`, `cli/selftune/hooks/skill-change-guard.ts` | PreToolUse hooks blocking unguarded SKILL.md edits | B |
| Dashboard | `cli/selftune/dashboard.ts`, `cli/selftune/dashboard-server.ts`, `dashboard/` | HTML dashboard + live Bun.serve server with SSE | B |
| Dashboard | `cli/selftune/dashboard.ts`, `cli/selftune/dashboard-server.ts`, `apps/local-dashboard/` | React SPA dashboard + live Bun.serve server with SQLite-backed v2 API | B |
| Specialized Agents | `.claude/agents/*.md` | Purpose-built agents (diagnosis, pattern, reviewer, integration) | B |
| Skill | `skill/` | Agent-facing skill (routing table + workflows + references) | B |

Expand Down Expand Up @@ -91,8 +91,17 @@ cli/selftune/
β”œβ”€β”€ evolution-reviewer.md Review proposed skill evolutions
└── integration-guide.md Guide project integration setup

dashboard/ HTML dashboard template
└── index.html Skill-health-centric SPA with embedded JSON data
apps/local-dashboard/ React SPA dashboard (Vite + TypeScript + shadcn/ui)
β”œβ”€β”€ src/
β”‚ β”œβ”€β”€ pages/ Overview + SkillReport routes
β”‚ β”œβ”€β”€ components/ Sidebar, skill grid, evidence viewer, evolution timeline
β”‚ β”œβ”€β”€ hooks/ useOverview (polling), useSkillReport
β”‚ └── types.ts TypeScript interfaces matching v2 API payloads
β”œβ”€β”€ vite.config.ts Dev proxy β†’ dashboard-server, build to dist/
└── package.json React 19, Tailwind v4, shadcn/ui, recharts

dashboard/ Legacy HTML dashboard (served at /legacy/)
└── index.html Original embedded-JSON dashboard (v1 endpoints)

templates/ Settings and config templates
β”œβ”€β”€ single-skill-settings.json
Expand Down Expand Up @@ -130,7 +139,7 @@ tests/sandbox/
| Monitoring | `cli/selftune/monitoring/` | `watch.ts` | Post-deploy regression detection | Shared, Evolution/audit |
| Status | `cli/selftune/` | `status.ts` | Skill health summary CLI | Shared, Monitoring, Evolution/audit |
| Last | `cli/selftune/` | `last.ts` | Last session insight CLI | Shared only |
| Dashboard | `cli/selftune/` | `dashboard.ts`, `dashboard-server.ts` | HTML dashboard builder + live SSE server | Shared, Monitoring, Evolution/audit |
| Dashboard | `cli/selftune/`, `apps/local-dashboard/` | `dashboard.ts`, `dashboard-server.ts`, React SPA | React SPA with SQLite-backed v2 API + legacy HTML builder + live server | Shared, Monitoring, Evolution/audit, LocalDB |
| Agents | `.claude/agents/` | `diagnosis-analyst.md`, `pattern-analyst.md`, `evolution-reviewer.md`, `integration-guide.md` | Specialized Claude Code agents | Reads log schema + config |
| Skill | `skill/` | `SKILL.md`, `Workflows/*.md`, `references/*.md`, `settings_snippet.json` | Agent-facing routing, workflows, domain knowledge | Reads log schema + config |
| Sandbox | `tests/sandbox/` | `run-sandbox.ts`, `fixtures/`, `docker/` | Sandbox test harness and Docker integration tests | All modules (test-only) |
Expand Down
17 changes: 17 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,23 @@ and this project adheres to [Semantic Versioning](https://semver.org/).

### Added

- **Local Dashboard SPA** β€” React + Vite + TypeScript SPA replacing the legacy embedded-HTML dashboard as the default view
- Overview page with KPI cards, skill health grid with status filters, evolution feed, unmatched queries
- Per-skill drilldown with usage stats, invocation records, evidence viewer, evolution timeline, pending proposals
- Collapsible sidebar navigation listing all skills by health status
- shadcn/ui component library with dark/light theme toggle and selftune branding
- TanStack Query for data fetching with smart caching, background refetch, and instant back-navigation
- 15-second background polling against SQLite-backed v2 API endpoints via TanStack Query `refetchInterval` (SSE was removed β€” SQLite reads are cheap enough for polling)
- New components: `EvidenceViewer`, `EvolutionTimeline`, `ActivityTimeline`, `SkillHealthGrid`, `SectionCards`, `InfoTip`
- Glossary tooltips on all metric labels (overview KPI cards, skill report KPI cards) explaining what each metric measures
- Tab description tooltips on skill report tabs (Evidence, Invocations, Prompts, Sessions, Pending)
- Collapsible lifecycle legend in evolution timeline explaining proposal stages (Created, Validated, Deployed, Rejected, Rolled Back)
- Evidence context banner explaining the evidence trail concept
- Renamed "Per-Entry Results" to "Individual Test Cases" for clarity
- Onboarding flow: full empty-state guide for first-time users (3-step setup), dismissible welcome banner for returning users (localStorage-persisted)
- **SQLite v2 API endpoints** β€” `GET /api/v2/overview` and `GET /api/v2/skills/:name` backed by materialized SQLite queries (`getOverviewPayload()`, `getSkillReportPayload()`, `getSkillsList()`)
- **SQL query optimizations** β€” Replaced `NOT IN` subqueries with `LEFT JOIN + IS NULL`, moved JS-side dedup to SQL `GROUP BY`, added `LIMIT 200` to unbounded evidence queries
- **SPA serving from dashboard server** β€” Built SPA served at `/`, legacy HTML dashboard moved to `/legacy/`
- **Source-truth-driven pipeline** β€” Transcripts and rollouts are now the authoritative source; `sync` rebuilds repaired overlays from source data rather than relying solely on hook-time capture
- **Telemetry contract package** β€” `@selftune/telemetry-contract` workspace package with canonical schema types, validators, versioning, metadata, and golden fixture tests
- **Test split** β€” `make test-fast` / `make test-slow` and `bun run test:fast` / `bun run test:slow` for faster development feedback loop
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ A continuous feedback loop that makes your skills learn and adapt. Automatically
- **Per-stage model control** β€” `--validation-model`, `--proposal-model`, and `--gate-model` flags give fine-grained control over which model runs each evolution stage.
- **Auto-activation system** β€” Hooks detect when selftune should run and suggest actions
- **Enforcement guardrails** β€” Blocks SKILL.md edits on monitored skills unless `selftune watch` has been run
- **Live dashboard server** β€” `selftune dashboard --serve` with SSE auto-refresh and action buttons
- **React SPA dashboard** β€” `selftune dashboard` serves a React SPA with skill health grid, per-skill drilldown, evidence viewer, evolution timeline, dark/light theming, and SQLite-backed v2 API (legacy dashboard at `/legacy/`)
- **Evolution memory** β€” Persists context, plans, and decisions across context resets
- **4 specialized agents** β€” Diagnosis analyst, pattern analyst, evolution reviewer, integration guide
- **Sandbox test harness** β€” Comprehensive automated test coverage, including devcontainer-based LLM testing
Expand All @@ -110,7 +110,7 @@ A continuous feedback loop that makes your skills learn and adapt. Automatically
| `selftune import-skillsbench` | Import external eval corpus from [SkillsBench](https://github.com/benchflow-ai/skillsbench) |
| `selftune badge --skill <name>` | Generate skill health badge SVG |
| `selftune watch --skill <name>` | Monitor after deploy. Auto-rollback on regression. |
| `selftune dashboard` | Open the visual skill health dashboard |
| `selftune dashboard` | Open the React SPA dashboard (SQLite-backed) |
| `selftune replay` | Backfill data from existing Claude Code transcripts |
| `selftune doctor` | Health check: logs, hooks, config, permissions |

Expand Down
7 changes: 7 additions & 0 deletions ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,13 @@
- Hosted badge service at `badge.selftune.dev`
- CLI `contribute --submit` for sharing skill data
- Agent-first skill restructure (init command, routing + workflows)
- Local Dashboard SPA:
- React + Vite + TypeScript SPA with shadcn/ui and Tailwind v4
- Overview page with KPI cards, skill health grid, evolution feed
- Per-skill drilldown with evidence viewer, evolution timeline
- SQLite v2 API endpoints (`/api/v2/overview`, `/api/v2/skills/:name`)
- Dark/light theme toggle with selftune branding
- SPA served at `/`, legacy HTML dashboard at `/legacy/`

## In Progress
- Multi-agent sandbox expansion
Expand Down
1 change: 1 addition & 0 deletions apps/local-dashboard/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
dist/
75 changes: 75 additions & 0 deletions apps/local-dashboard/HANDOFF.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# Local Dashboard SPA β€” Handoff

## Architecture

React SPA built with Vite + TypeScript that consumes the **SQLite-backed v2 API endpoints** from the dashboard server. The server materializes JSONL logs into a local SQLite database (`~/.selftune/selftune.db`) and serves pre-aggregated query results.

### Data flow

```text
JSONL logs β†’ materializeIncremental() β†’ SQLite β†’ getOverviewPayload() / getSkillReportPayload() β†’ /api/v2/* β†’ SPA
```

## What is implemented

- **Two routes**:
- `/` β€” Overview with KPI section cards (with info tooltips), skill health grid with status filters (healthy/warning/critical/unknown), evolution feed (ActivityTimeline), unmatched queries, onboarding banner (dismissible, localStorage-persisted)
- `/skills/:name` β€” Per-skill drilldown with usage stats (with info tooltips), invocation records, EvidenceViewer (collapsible evidence entries with markdown rendering, context banner), EvolutionTimeline (vertical timeline with pass-rate deltas, lifecycle legend), pending proposals, tab descriptions via hover tooltips
- **UX helpers**: `InfoTip` component for glossary tooltips on all metrics, lifecycle legend in evolution timeline, evidence context banner, onboarding flow for first-time users
- **Data layer**: TanStack Query (`@tanstack/react-query`) with smart caching, fetching from v2 endpoints backed by SQLite materialized queries
- `GET /api/v2/overview` β€” combined `getOverviewPayload()` + `getSkillsList()`
- `GET /api/v2/skills/:name` β€” `getSkillReportPayload()` + evolution audit + pending proposals
- **Live updates**: 15-second polling interval via TanStack Query `refetchInterval` (replaced old SSE approach)
- **Caching**: `staleTime` of 10s (overview) / 30s (skill report) for instant back-navigation; `gcTime` of 5 minutes; automatic background refetch on window focus
- **Loading/error/empty/not-found states** on every route
- **UI framework**: shadcn/ui components with dark/light theme toggle, TanStack Table for data grids
- **Design**: selftune branding, collapsible sidebar, Tailwind v4

## How to run

```bash
# Terminal 1: Start the dashboard server
selftune dashboard --port 7888

# Terminal 2: Start the SPA dev server (proxies /api to port 7888)
cd apps/local-dashboard
bun install
bunx vite
# β†’ opens at http://localhost:5199
```

## What was rebased / changed

- **SPA types**: Rewritten to match `queries.ts` payload shapes (`OverviewResponse`, `SkillReportResponse`, `SkillSummary`, `EvidenceEntry`)
- **API layer**: Now calls `/api/v2/overview` and `/api/v2/skills/:name` instead of `/api/data` + `/api/evaluations/:name`
- **SSE removed**: Replaced with 15s polling (SQLite reads are cheap, SSE was complex)
- **Overview page**: Uses `SkillSummary[]` from `getSkillsList()` for skill cards (pre-aggregated pass rate, check count, sessions)
- **Skill report page**: Single fetch to v2 endpoint instead of parallel overview + evaluations fetch. Shows evidence entries, evolution audit history per skill
- **Hooks**: Migrated to TanStack Query β€” `useOverview` uses `useQuery` with `refetchInterval`, `useSkillReport` uses `useQuery` with smart retry (skips retry on 404). Manual polling, request deduplication, and stale-request guards replaced by TanStack Query built-ins.

## Query optimizations

- **Pending proposals**: Replaced `NOT IN` subquery + JS `Set` dedup with `LEFT JOIN + IS NULL + GROUP BY` in both `queries.ts` and `dashboard-server.ts`
- **Evidence query bounded**: Added `LIMIT 200` to `getSkillReportPayload()` evidence query (was unbounded)
- **Indexes**: 16 indexes defined in `schema.ts` covering all frequent filter/join columns (`skill_name`, `session_id`, `proposal_id`, `timestamp`, `query+triggered`)

## What now uses SQLite / materialized queries

- **Overview**: `getOverviewPayload(db)` for evolution, unmatched queries, pending proposals, counts; `getSkillsList(db)` for per-skill aggregated stats
- **Skill report**: `getSkillReportPayload(db, skillName)` for usage stats, recent invocations, evidence; direct SQL for evolution audit + pending proposals per skill
- **Server**: `materializeIncremental(db)` runs at startup and refreshes every 15s on v2 endpoint access

## What still depends on old dashboard code

- The old v1 endpoints (`/api/data`, `/api/events`, `/api/evaluations/:name`) still work and are used by the legacy `dashboard/index.html`
- Badge endpoints (`/badge/:name`) and report HTML endpoints (`/report/:name`) use the old `computeStatus` + JSONL reader path
- Action endpoints (`/api/actions/*`) are unchanged

## What remains before this can become default

1. ~~**Serve built SPA from dashboard-server**~~: Done β€” `/` serves SPA, old dashboard at `/legacy/`
2. ~~**Production build**~~: Done β€” `bun run build:dashboard` in root package.json
3. **Regression detection**: The SQLite layer doesn't compute regression detection yet β€” `deriveStatus()` currently only uses pass rate + check count. Add a `regression_detected` column to skill summaries when the monitoring snapshot computation moves to SQLite.
4. **Monitoring snapshot migration**: Move `computeMonitoringSnapshot()` logic into the SQLite materializer or a query helper (window sessions, false negative rate, baseline comparison)
5. **Actions integration**: Wire up watch/evolve/rollback buttons in the SPA to `/api/actions/*`
6. **Migrate badge/report endpoints**: Switch to SQLite-backed queries
Loading
Loading