diff --git a/.agent/rules/claude.md b/.agent/rules/claude.md new file mode 100644 index 0000000..c653fc5 --- /dev/null +++ b/.agent/rules/claude.md @@ -0,0 +1,141 @@ +--- +trigger: always_on +--- + +# Claude Instructions for code-executor-mcp + +> πŸ“š **Quick Reference:** Type these in chat to load into context: +> - `@docs/coding-standards.md` - SOLID/DRY/KISS, TDD, best practices +> - `@docs/release-workflow.md` - Patch/minor/major release steps + +## 🚨 CRITICAL: Always Use Code Executor MCP + +**MANDATORY:** Use `mcp__code-executor__executeTypescript` + `callMCPTool` for ALL operations: +- ❌ **DON'T:** Write tool, Read tool, Bash commands for file operations +- βœ… **DO:** `executeTypescript` with `callMCPTool('mcp__filesystem__write_file', ...)` + +**Why this matters:** +- Single round-trip (discover + execute + verify in one call) +- Tests the actual MCP we're building (dogfooding) +- Variables persist across operations (no context switching) +- Real-world usage pattern that validates our architecture + +**Example - File Operations:** +```typescript +// ❌ BAD: Using traditional tools +Write('/tmp/test.json', content); // Doesn't test our MCP + +// βœ… GOOD: Using code-executor MCP +await mcp__code-executor__executeTypescript({ + code: ` + const tools = await discoverMCPTools({ search: ['file'] }); + const content = JSON.stringify({ test: true }, null, 2); + await callMCPTool('mcp__filesystem__write_file', { + path: '/tmp/test.json', + content + }); + const result = await callMCPTool('mcp__filesystem__read_file', { + path: '/tmp/test.json' + }); + console.log('Verified:', JSON.parse(result.content)); + `, + allowedTools: ['mcp__filesystem__*'] +}); +``` + +**When to use traditional tools:** +- Reading project source code for review/analysis +- Git operations (commits, merges, branches) +- Build/test commands (`npm run build`, `npm test`) +- Everything else: Use code-executor MCP + +## Project Overview + +**code-executor-mcp** - Universal MCP server with progressive disclosure | **98% token reduction** (141k β†’ 1.6k) + +**Core Concept:** 2 execution tools (`executeTypescript`, `executePython`) call other MCPs on-demand via `callMCPTool('mcp__server__tool', params)` + +**Key Features:** Progressive disclosure | AJV schema validation | AsyncLock schema cache | Deno sandbox | Multi-transport (STDIO/HTTP) + +## Current State + +**Version:** v0.3.1 (pre-1.0 beta) | **Branches:** `main` (stable, PR-only) + `develop` (active) | **Stack:** TypeScript 5.x + Node.js 20+ + @modelcontextprotocol/sdk + AJV + async-lock + Vitest + Deno + +**Recent:** Deep validation (AJV) | AsyncLock mutex | 253 tests (98%+ coverage) | Runtime validation primary approach + +## Architecture + +**Components:** MCP Proxy Server | MCP Client Pool (STDIO/HTTP) | Schema Cache (24h TTL, AsyncLock) | Schema Validator (AJV) | Executors (TypeScript/Deno, Python) + +**Key Files:** `package.json` | `CHANGELOG.md` | `RELEASE.md` | `SECURITY.md` + +## Development Workflow + +**Branch Strategy:** Work on `develop` β†’ PR to `main` β†’ `npm version` β†’ `gh release create` β†’ sync `develop` + +**Commands:** `npm test` | `npm run typecheck` | `npm run build` | `npm run lint` + +**Standards:** TDD mandatory | 98%+ coverage (validation/caching) | TypeScript strict | SOLID principles | Security first + +**Important:** When performing these tasks, reference the relevant docs: +- **Writing code?** Reference @docs/coding-standards.md for SOLID/DRY/KISS principles, TDD requirements +- **Creating release?** Reference @docs/release-workflow.md for step-by-step patch/minor/major instructions + +## Key Decisions + +**AJV:** Industry-standard | Deep recursive validation | Self-documenting errors | Zero maintenance +**AsyncLock:** Prevents race conditions | Thread-safe cache writes | Production-ready +**24h TTL:** Schemas rarely change | Reduces network overhead | Stale-on-error resilience + +## Common Tasks + +**Feature:** `develop` branch β†’ TDD β†’ implement β†’ tests β†’ CHANGELOG β†’ commit β†’ PR +**Bugfix:** Failing test β†’ fix β†’ verify β†’ CHANGELOG β†’ `fix:` commit +**Release:** See [Release Workflow](docs/release-workflow.md) for step-by-step instructions (patch/minor/major) + +## Testing + +**Structure:** Vitest + TypeScript | Mock dependencies | `vi.useFakeTimers()` | Test edge cases +**Coverage:** Validation 98%+ | Caching 70%+ | Overall 90%+ +**Focus:** βœ… Logic/errors/edge cases/security | ❌ Third-party libs + +## Security (ZERO TOLERANCE) + +**Validation:** MUST validate all MCP tool calls | Nested objects/arrays recursive | No type coercion | No info leakage +**Sandbox:** Minimal Deno permissions | Block eval/exec/__import__ | Prevent path traversal | Rate limiting +**Audit:** Log all executions (timestamp, tool, params hash, status) | NO sensitive data + +## Dependencies + +**Production:** @modelcontextprotocol/sdk | ajv ^8.17.1 | async-lock ^1.4.1 | zod | ws +**Development:** vitest | typescript | @types/async-lock + +## Troubleshooting + +**Fake Timers:** `vi.useFakeTimers()` in `beforeEach` | `vi.advanceTimersByTime()` | `vi.useRealTimers()` in `afterEach` +**Cache Corruption:** Check AsyncLock | Delete `~/.code-executor/schema-cache.json` +**Validation:** Check AJV errors | Verify schema | Test minimal params first + +## Available Agents (Use Proactively) + +- **code-guardian** - Review code quality, SOLID principles, MCP patterns, security (use after implementation) +- **inquisitor** - Debug complex issues, trace root causes, systematic investigation (use for bugs) +- **project-librarian** - Explore codebase, find files/functions, understand structure (use before changes) +- **project-documentarian** - Maintain devlogs, preserve context, JSDoc enhancement (use for documentation) +- **document-reviewer** - Review documentation quality and completeness (use for docs) +- **research-specialist** - Fetch latest library docs, research technical questions (use for unknowns) + +## Available Slash Commands (Use Proactively) + +- **/build** - Build with TypeScript/ESLint enforcement, clean dist/ artifacts +- **/code-review** - Comprehensive review against MCP server standards, invoke code-guardian +- **/commit** - Create proper git commits with validation, handle pre-commit hooks +- **/debug** - Investigate MCP server issues, schema validation, concurrency problems +- **/fix** - Fix issues at root cause, enforce proper solutions (no quick hacks) +- **/test** - Execute Vitest tests, focus on validation/caching/security coverage +- **/compact_FILE** - Consolidate verbose files, remove duplicates, preserve all info +- **/split-context** - Extract area-specific content into local CLAUDE.md files + +## Contact + +**Issues:** https://github.com/aberemia24/code-executor-MCP/issues | **Email:** aberemia@gmail.com | **Docs:** https://github.com/aberemia24/code-executor-MCP#readme diff --git a/.agent/rules/coding-standards.md b/.agent/rules/coding-standards.md new file mode 100644 index 0000000..2b6a7a8 --- /dev/null +++ b/.agent/rules/coding-standards.md @@ -0,0 +1,146 @@ +--- +trigger: always_on +--- + +# Code Executor MCP - Coding Standards + +**Project:** MCP orchestration server | **Stack:** Node.js 22+ | TypeScript 5.x (strict) | Vitest 4.0 | AJV 8.x | Deno 2.x + +## ⚑ ZERO TOLERANCE + +Build fails on violations. NO workarounds. **Priority:** Security > Validation > Architecture > Style + +## πŸ”΄ CRITICAL RULES + +### Security & Validation +- **AJV validation MANDATORY** - ALL MCP tool calls validated (deep recursive, no bypass) +- **NO type coercion** - Strict type checking (integer β‰  number) +- **Sandbox isolation** - Deno permissions minimal, dangerous pattern detection +- **AsyncLock MANDATORY** - ALL concurrent disk writes (schema cache, audit logs) +- **Audit everything** - Tool calls, executions, failures with timestamps +- **NO hardcoded secrets** - Env vars only, validated with Zod + +### Architecture +- **SOLID** - SRP strict | NO God Objects | KISS | DRY pragmatic | YAGNI +- **NO ANY types** - Use `unknown` + type guards +- **Progressive disclosure** - Tools loaded on-demand, not upfront +- **Race condition free** - AsyncLock mutex for all shared resources + +### Testing & Quality +- **TDD MANDATORY** - Business logic and validation (98%+ coverage) +- **Edge cases first** - Nested objects, concurrent access, TTL expiration +- **Fake timers** - Use `vi.useFakeTimers()` for time-based tests (NO setTimeout) +- **Coverage goals** - Validation 98%+ | Caching 70%+ | Overall 90%+ + +## 🧠 STACK + +**Runtime:** Node.js 22+ LTS | **Executors:** Deno 2.x (TS), Python 3.9+ | **Testing:** Vitest 4.0 | **Validation:** AJV 8.x +**MCP:** @modelcontextprotocol/sdk | **Concurrency:** async-lock | **Transport:** STDIO + HTTP/SSE + +## πŸ“‹ PATTERNS + +### Schema Validation (AJV) +```typescript +const result = validator.validate(params, schema); +if (!result.valid) throw new Error(validator.formatError(toolName, params, schema, result)); +``` + +### Cache Access (AsyncLock) +```typescript +await this.lock.acquire('cache-write', async () => { await fs.writeFile(cachePath, data); }); +``` + +### MCP Tool Calls (Progressive Disclosure) +```typescript +const result = await callMCPTool('mcp__zen__codereview', { step: '...', step_number: 1 }); +``` + +## πŸ§ͺ TESTING + +| Component | Coverage | Approach | +|-----------|----------|----------| +| Validation | 98%+ | TDD: REDβ†’GREENβ†’REFACTOR | +| Caching | 70%+ | Race conditions, TTL, concurrency | +| Executors | 80%+ | Sandbox escapes, permissions | +| Security | 95%+ | Input validation, pattern detection | + +**Pass rates:** Validation β‰₯98% | Core β‰₯90% | Integration β‰₯80% + +### Test Standards +```typescript +beforeEach(() => vi.useFakeTimers()); +afterEach(() => vi.useRealTimers()); +vi.advanceTimersByTime(150); // Deterministic time control +``` + +## πŸš€ BUILD + +- **NO suppression** - `ignoreBuildErrors: false` | NO `@ts-ignore` +- **TypeScript strict** - Full strict mode enabled +- **Pre-commit** - `npm run lint && npm run typecheck && npm run build && npm test` +- **Environment** - Node.js v22.x LTS | npm | TypeScript 5.x strict + +## πŸ“ REFERENCE + +### Naming +| Element | Format | Example | +|---------|--------|---------| +| Files | kebab-case | `schema-cache.ts` | +| Classes | PascalCase | `SchemaValidator` | +| Functions | camelCase | `getToolSchema()` | +| Constants | UPPER_SNAKE | `DEFAULT_TTL_MS` | + +### Commands +```bash +npm run lint && npm run typecheck && npm run build && npm test # Pre-commit +npm run server # Start MCP server +npm test # Run all tests +npm run typecheck # TypeScript check +``` + +## 🚫 FORBIDDEN + +### Validation +❌ Skipping AJV validation | ❌ Type coercion | ❌ Shallow validation | ❌ Bypassing schema checks + +### Build +❌ `@ts-ignore` | ❌ `any` types | ❌ `ignoreBuildErrors: true` | ❌ Unvalidated inputs + +### Concurrency +❌ Concurrent writes without mutex | ❌ Shared resource without lock + +### Security +❌ Hardcoded secrets | ❌ Missing sandbox permissions | ❌ Path traversal | ❌ Command injection + +### Testing +❌ `setTimeout` in tests | ❌ Skipping edge cases | ❌ Missing coverage on validation + +### Deprecated +❌ Custom shallow validation | ❌ Wrappers as primary approach | ❌ Unprotected disk writes + +## πŸ”’ SECURITY + +### Input Validation +- **ALL external inputs** validated (MCP calls, env vars, file paths) +- **Deep recursive** - Nested objects, arrays, constraints, enums +- **Type strict** - No coercion (integer vs number) + +### Sandbox Isolation +- **Deno minimal permissions** - Read/write/net restricted +- **Dangerous patterns blocked** - eval, exec, __import__, pickle.loads +- **Path validation** - No directory traversal +- **Rate limiting** - 30 req/min default + +### Audit Logging +- **ALL executions** logged (timestamp, tool, params hash, status) +- **NO sensitive data** in logs + +## πŸ“Š METRICS + +**Coverage:** Validation 98.27% | Cache 74% | Overall 90%+ +**Token Savings:** 98% (141k β†’ 1.6k tokens) +**Build:** <30s | **Test:** <60s + +--- + +**Version:** 0.3.1 | **Node.js:** v22.x LTS | **Enforcement:** ESLint + TypeScript strict + pre-commit + CI/CD diff --git a/.agent/workflows/build.md b/.agent/workflows/build.md new file mode 100644 index 0000000..afe987c --- /dev/null +++ b/.agent/workflows/build.md @@ -0,0 +1,87 @@ +--- +argument-hint: [clean|production] +description: Builds code-executor-mcp with strict TypeScript/ESLint enforcement, validates MCP server compilation +allowed-tools: Bash, BashOutput, KillShell, Read, TodoWrite, Glob +--- + +Build "$ARGUMENTS" (default: development) + +## 🚨 CRITICAL BUILD LAWS + +**Non-Negotiable Rules:** + +- πŸ“¦ **ZERO TOLERANCE:** TypeScript/ESLint errors WILL fail build +- 🎯 **Fix FIRST error**, not loudest (root cause analysis) +- βš™οΈ **Type Safety:** Full TypeScript strict mode enforcement +- πŸ”§ **Clean Build:** dist/ directory must compile successfully + +--- + +## 🧹 CLEAN (Nuclear Option) + +**When to clean:** Corrupted cache, mysterious build failures, or explicit `clean` argument + +```bash +# Remove all build artifacts +rm -rf dist node_modules/.cache + +# Clear schema cache +rm -rf ~/.code-executor/schema-cache.json + +# Reinstall if package.json changed +npm install +``` + +--- + +## πŸ—οΈ BUILD VALIDATION (MANDATORY SEQUENCE) + +**MCP Server compilation chain:** + +``` +TypeScript Compilation β†’ Type Checking β†’ Linting β†’ dist/ Output +``` + +**Why:** Type safety ensures MCP tool schemas are correctly typed and validated + +--- + +## πŸ” COMMON FAILURES & FIXES + +| Error | Root Cause | Solution | +| -------------------------- | ------------------------------ | ---------------------------------- | +| `Cannot find module` | Invalid import path | Check tsconfig.json paths | +| `Type error in executor` | Schema validation types wrong | Check AJV types and validators | +| `dist/ incomplete` | Build interrupted | `rm -rf dist && npm run build` | +| `Schema cache error` | Corrupted cache file | `rm ~/.code-executor/schema-cache.json` | +| `@ts-ignore present` | Type safety bypassed | FORBIDDEN - Fix type issues | + +--- + +## ⚑ QUALITY CIRCUIT TRIGGER + +### Pre-Build Validation + +**ALWAYS run before build:** + +```bash +npm run lint && npm run typecheck && npm run build +``` + +### Build Failure Escalation + +**TypeScript/ESLint errors β†’ STOP and fix immediately:** +- **Schema changes** β†’ Verify schema-validator.ts types +- **Type errors in executors** β†’ Check TypeScript/Python executor types +- **MCP SDK version mismatch** β†’ Verify @modelcontextprotocol/sdk version + +### Success Path + +1. If build **PASSES** β†’ Run test suite (`npm test`) +2. **EXCEPTION:** Skip if issue documented in development notes + +**Safety Limit:** Max 5 circuit iterations to prevent infinite loops + +--- + +**Type safety is LAW. Nuclear clean when corrupted.** \ No newline at end of file diff --git a/.agent/workflows/code-review.md b/.agent/workflows/code-review.md new file mode 100644 index 0000000..75172e3 --- /dev/null +++ b/.agent/workflows/code-review.md @@ -0,0 +1,149 @@ +--- +argument-hint: [file-or-pattern] +description: Performs comprehensive code review after implementation, checks MCP server standards, invokes code-guardian agent +allowed-tools: Task, TodoWrite, Bash, Glob, Grep, Read, WebSearch, mcp__code-executor__executeTypescript +--- + +Code Review "$ARGUMENTS" (or last changes if empty) + +## πŸ“‹ CONTEXT + +**Project:** code-executor-mcp - Universal MCP server with progressive disclosure + +**Stack:** TypeScript 5.x + Node.js 20+ + @modelcontextprotocol/sdk + AJV + async-lock + Vitest + Deno sandbox + +**Development Phase:** v0.3.x (pre-1.0 beta) + +**Review Philosophy:** + +- ❌ NO enterprise bullshit or theoretical concerns +- βœ… Focus on what code ACTUALLY does (not fantasy scenarios) +- βœ… Check architecture standards in **docs/architecture.md** and **CLAUDE.md** +- βœ… REAL issues that break builds only +- βœ… MCP Server Quality: schema validation, security, type safety + +--- + +## πŸ›‘οΈ INVOKE CODE-GUARDIAN (MANDATORY) + +**Use Task tool with code-guardian agent:** + +``` +Review type: "full" +Project: "code-executor-mcp - MCP Server with progressive disclosure" +Context: "DEVELOPMENT - Apply DEVELOPMENT CONTEXT FILTERS first: Working+tested code stays. Prove issues with measurements, not theory. REJECT production theater (scaling, monitoring, circuit breakers). Report ONLY: build breaks, proven security holes, actual bugs." +Focus: SOLID/DRY/KISS violations, MCP SDK patterns, AJV schema validation, security sandbox escapes, actual bugs +``` + +--- + +## 🚨 CRITICAL VIOLATIONS (ZERO TOLERANCE) + +- ❌ Hardcoded secrets, API keys, MCP server URLs +- ❌ `@ts-ignore` without explicit justification +- ❌ Missing schema validation for MCP tool parameters +- ❌ Sandbox escapes (eval, exec, __import__ in Deno) +- ❌ Direct file system access without permission checks +- ❌ `any` types without explicit justification +- ❌ Missing error handling in executor wrappers +- ❌ Schema cache race conditions (missing AsyncLock) +- ❌ Unvalidated MCP client pool connections + +--- + +## βœ… REAL REVIEW CHECKLIST + +**Build & Standards:** + +- Will it compile? (`npm run build`) +- Pass TypeScript strict mode? (`npm run typecheck`) +- Pass linting? (`npm run lint`) +- Node.js 20+ compatible? + +**MCP Server Patterns:** + +- MCP SDK @modelcontextprotocol/sdk used correctly +- All tool schemas properly defined +- Tool handlers return correct response format +- Error handling with proper MCP error codes + +**Type Safety & Validation:** + +- All MCP tool parameters validated with AJV +- Deep recursive validation (nested objects/arrays) +- No type coercion (strict type checking) +- Schema cache properly typed + +**Security:** + +- Deno sandbox permissions minimal (read/write/net restrictions) +- Dangerous pattern detection (eval, exec, path traversal) +- Rate limiting implemented +- Audit logs for tool executions +- No sensitive data in error messages + +**Concurrency & Caching:** + +- AsyncLock mutex for schema cache writes +- No race conditions on concurrent tool calls +- TTL handling correct (24h default) +- Stale-on-error pattern implemented + +**Testing:** + +- Vitest tests exist for new code +- 90%+ coverage for validation/caching code +- Edge cases tested (concurrent access, TTL expiration) +- Mock external dependencies (MCPClientPool, fs) + +--- + +## πŸ™… SKIP PRODUCTION THEATER + +**Filter out these nonsense concerns:** + +- ❌ "Not production-ready" (we're on DEVELOP) +- ❌ "Needs enterprise monitoring" +- ❌ "99.99% uptime" requirements +- ❌ "Horizontal scaling" concerns +- ❌ "Circuit breakers" overkill + +**We're on DEVELOP, not running a bank. REAL ISSUES ONLY.** + +--- + +## βœ… VALIDATE + +**Mandatory quality checks:** + +```bash +npm run lint && npm run typecheck && npm run build +``` + +**Additional checks:** + +```bash +# Check for hardcoded secrets +grep -r "sk-" src/ || echo "OK: No API keys found" +grep -r "process.env" src/ | grep -v "NODE_ENV" || echo "OK: No direct env access" + +# Verify schema validation +grep -r "validate(" src/ | wc -l + +# Check Deno sandbox permissions +grep -r "dangerouslyDisableSandbox" src/ && echo "WARNING: Sandbox disabled" + +# Verify AsyncLock usage +grep -r "schemaLock" src/schema-cache.ts || echo "ERROR: Missing mutex" +``` + +--- + +## ⚑ QUALITY CIRCUIT TRIGGER + +**Automated enforcement after review completes:** + +1. If severity β‰₯ MEDIUM β†’ **CRITICAL: automatically INVOKE /fix immediately** +2. If >2 LOW severity issues β†’ **CRITICAL: automatically INVOKE /fix immediately** + +**Safety Limit:** Max 5 circuit iterations to prevent infinite loops \ No newline at end of file diff --git a/.agent/workflows/commit.md b/.agent/workflows/commit.md new file mode 100644 index 0000000..d9aa862 --- /dev/null +++ b/.agent/workflows/commit.md @@ -0,0 +1,148 @@ +--- +argument-hint: [message|--amend|--squash] +description: Creates proper git commits with validation for code-executor-mcp, follows TypeScript/MCP server standards, handles pre-commit hooks +allowed-tools: Bash, BashOutput, Read, Glob, Grep, TodoWrite, mcp__ide__getDiagnostics +--- + +Commit "$ARGUMENTS" - code-executor-mcp Project Standards + +## 🚨 ZERO TOLERANCE + +**Forbidden Actions:** + +- ❌ NO force push to `develop`/`master` +- ❌ NO commits without validation +- ❌ NO `--amend` on others' work +- ❌ NO secrets in commits (API keys, database URLs, tokens) +- ❌ NEVER `--no-verify` without explicit user request +- ❌ NO `@ts-ignore` or `ignoreBuildErrors: true` +- ❌ NO hardcoded env vars (use validated env config) + +--- + +## βœ… PRE-COMMIT VALIDATION + +**Mandatory quality checks for code-executor-mcp:** + +```bash +# 1. Code quality (TypeScript strict mode + ESLint) +npm run lint && npm run typecheck + +# 2. Build verification (zero tolerance - must pass) +npm run build + +# 3. Test coverage check +npm test + +# 4. Review changes +git status && git diff --cached +``` + +--- + +## πŸ§ͺ TEST GATE + +**code-executor-mcp testing strategy:** + +| Change Type | Test Requirement | +| --------------------- | --------------------------------------- | +| Validation logic | Vitest tests MUST pass (β‰₯90% coverage) | +| Schema caching | Tests REQUIRED (concurrency, TTL) | +| MCP tool handlers | Integration tests RECOMMENDED | +| Security features | Tests REQUIRED (sandbox, permissions) | +| Bug fixes | Regression test REQUIRED | +| NO tests for logic | **BLOCK commit** | +| Tests fail | **BLOCK commit** | + +**Test commands:** +- All tests: `npm test` +- Watch mode: `npm run test:watch` +- Coverage: `npm run test:coverage` + +--- + +## πŸ“ COMMIT MESSAGE FORMAT + +``` +feat(validator): add deep schema validation with AJV + +Implement recursive validation for nested objects and arrays +to replace shallow custom validator. + +πŸ€– Generated with [Claude Code](https://claude.com/claude-code) + +Co-Authored-By: Claude +``` + +**Format Rules:** + +- **Type:** `feat` / `fix` / `refactor` / `chore` / `docs` / `test` +- **Scope:** `(validator)` / `(cache)` / `(executor)` / `(mcp)` / `(security)` / `(config)` +- **Body:** Explain WHY (2-3 sentences max), not WHAT (code shows what) +- **Footer:** Always include Claude Code attribution (shown above) + +--- + +## πŸ”’ SAFETY CHECKS + +**code-executor-mcp Branch Protection:** + +- βœ… Work on `develop` branch (main development) +- 🚨 `main` branch = stable releases (no direct commits, PR-only) +- 🚨 Schema cache = never commit `~/.code-executor/schema-cache.json` +- 🚨 Never commit `.env` files, API keys, or MCP server credentials + +**Pre-Amend Checks:** + +```bash +# Verify commit NOT pushed +git status # Must show "Your branch is ahead" + +# Check authorship BEFORE --amend +git log -1 --format='%an %ae' # NEVER amend others' commits +``` + +**Hook Failures:** + +- ONE retry allowed on pre-commit hook failures +- If hook modifies files β†’ safe to amend ONLY if you own the commit +- Otherwise β†’ create NEW commit + +--- + +## ⚑ QUALITY CIRCUIT TRIGGER + +**Auto-escalation before commit:** + +1. **TypeScript errors** β†’ **CRITICAL: Fix immediately** (strict mode enforced) +2. **ESLint errors** β†’ **CRITICAL: Run `npm run lint` first** +3. **Build fails** β†’ **CRITICAL: Run `npm run build` first** +4. **Tests fail** β†’ **CRITICAL: Run tests and fix failures** +5. **Missing AJV validation** β†’ **CRITICAL: Validate all MCP tool parameters** +6. Only commit when ALL checks pass + +--- + +## 🎯 CODE-EXECUTOR-MCP SPECIFIC CHECKS + +**Before committing, verify:** + +- βœ… AJV validation on all MCP tool parameters +- βœ… Schema cache AsyncLock mutex for concurrent access +- βœ… Deno sandbox permissions properly restricted +- βœ… JSDoc comments on public functions +- βœ… Error handling with proper MCP error codes +- βœ… Vitest tests for new validation/caching logic +- βœ… No hardcoded MCP server URLs or credentials + +**Security features:** +- βœ… Dangerous pattern detection (eval, exec, __import__) +- βœ… Path validation prevents directory traversal +- βœ… Rate limiting implemented +- βœ… Audit logs for tool executions + +--- + +**Commit discipline = Project quality = MCP server reliability** + +**Stack:** TypeScript 5.x + Node.js 20+ + @modelcontextprotocol/sdk + AJV + async-lock + Vitest diff --git a/.agent/workflows/compact_FILE.md b/.agent/workflows/compact_FILE.md new file mode 100644 index 0000000..470e372 --- /dev/null +++ b/.agent/workflows/compact_FILE.md @@ -0,0 +1,56 @@ +--- +argument-hint: [target-file] +description: Consolidates AGENTS.md files by removing duplicates, tightening verbose sections, migrating to child files +allowed-tools: Read, Edit, Write, Bash, Grep, TodoWrite +--- + +# Consolidate AGENTS.md "$ARGUMENTS" (or main AGENTS.md if empty) + +## 🎯 GOAL + +Transform kitchen-sink AGENTS.md files into efficient entry points: + +- **Target:** 40-65% reduction, ZERO info loss +- **Method:** Constitution + Navigation Map + Quick Reference + +--- + +## πŸ“‹ PROCESS + +### 1. Backup & Analyze + +`cp $TARGET $TARGET.backup-$(date +%Y%m%d-%H%M%S) && wc -l < $TARGET` + +**Find:** Duplicates in child files (REMOVE) | Verbose sections (TIGHTEN) | Misplaced details (MOVE) + +### 2. Actions + +**REMOVE** - Already in child files (grep verify first) +**MOVE** - Migrate to correct child file +**TIGHTEN** - Multi-line β†’ pipe-separated (`**Runtime:** Node 24 | **Frontend:** React 19`) +**REFERENCE** - Use `@child/AGENTS.md` pointers + +### 3. Validate + +`wc -l AGENTS.md && grep -c "CRITICAL\|NEVER" AGENTS.md` + +### 4. Audit against backup + +**CRITICAL** Check the new compacted AGENTS.md file, gainst its backup, make sure no information was missed. + +--- + +## βœ… MANDATORY CHECKLIST + +- [ ] Backup created with timestamp +- [ ] Remove duplicates (grep verify in child files FIRST) +- [ ] Move content to correct child files +- [ ] Tighten verbose sections (pipe-separated) +- [ ] Preserve ALL CRITICAL/NEVER/MANDATORY rules +- [ ] 40-65% reduction achieved +- [ ] All info preserved (grep verification) +- [ ] Audit of compacted version against the backup file + +--- + +**Detailed Guide:** docs/claude-md-consolidation-guide.md \ No newline at end of file diff --git a/.agent/workflows/debug.md b/.agent/workflows/debug.md new file mode 100644 index 0000000..b125aa0 --- /dev/null +++ b/.agent/workflows/debug.md @@ -0,0 +1,45 @@ +--- +argument-hint: +description: Use proactively to debug and investigate issues in the MCP server +allowed-tools: Task, TodoWrite, Bash, Glob, Grep, Read, Edit, MultiEdit, Write, WebFetch, WebSearch, mcp__code-executor__executeTypescript +--- + +Debug $ARGUMENTS - MCP Server Investigation + +## πŸ” DEBUGGING APPROACH + +**Use inquisitor agent for systematic debugging:** + +1. **Root Cause Analysis** - Trace error to origin +2. **Systematic Investigation** - Use logs, tests, and code inspection +3. **No Code Modification** - Investigation only, fixes happen in /fix + +## πŸ› οΈ DEBUGGING TOOLS + +**Code Executor:** Use `mcp__code-executor__executeTypescript` for: +- Multi-file analysis +- Stateful investigation workflows +- Schema validation testing +- MCP client pool inspection + +## 🎯 COMMON DEBUG SCENARIOS + +**Schema Validation Issues:** +- Check AJV validation errors +- Inspect schema cache state +- Verify nested object/array validation + +**Concurrency Issues:** +- Check AsyncLock mutex behavior +- Inspect race condition patterns +- Verify TTL expiration handling + +**MCP Client Issues:** +- Check MCP server connections +- Verify transport protocols (STDIO/HTTP) +- Inspect tool schema retrieval + +**Security Issues:** +- Check Deno sandbox permissions +- Verify dangerous pattern detection +- Inspect audit logs \ No newline at end of file diff --git a/.agent/workflows/fix.md b/.agent/workflows/fix.md new file mode 100644 index 0000000..5df6277 --- /dev/null +++ b/.agent/workflows/fix.md @@ -0,0 +1,78 @@ +--- +argument-hint: +description: Fixes issues at root cause level, prevents quick hacks, enforces proper solutions +allowed-tools: Task, TodoWrite, Bash, Glob, Grep, Read, Edit, MultiEdit, Write, WebFetch, WebSearch, mcp__code-executor__executeTypescript +--- + +Fix $ARGUMENTS - Root Cause, Not Symptoms + +**IMPORTANT** - if a gh issue is provided, please use the CLI to see it as the repo may be private. + +## 🚨 ZERO TOLERANCE + +**Forbidden Anti-Patterns:** + +- ❌ `@ts-ignore`, `any` types without justification +- ❌ Unvalidated MCP tool parameters +- ❌ Direct process.env access, hardcoded secrets, MCP server URLs +- ❌ Sandbox escapes (eval, exec, __import__) + +--- + +## 🧠 ULTRATHINK FIRST + +**Before writing any code:** + +1. **Root Cause Analysis** - Trace error to origin (not just symptoms) +2. **Map Dependencies** - Identify impacts across validator/cache/executor layers +3. **Question Assumptions** - One schema error can cascade through entire MCP server + +--- + +## πŸ” INVESTIGATE + +**Understanding Phase:** + +- Use **project-librarian agent** to understand code structure + - **CRITICAL:** For investigation ONLY, NOT for fixes +- Review **CLAUDE.md** and **docs/coding-standards.md** for MCP server patterns +- Check **CHANGELOG.md** for recent changes and known issues + +--- + +## πŸ”§ FIX + +**Implementation Requirements:** + +- βœ… Fix root cause only (update in-place, NO duplicates) +- βœ… Apply SOLID/DRY/KISS principles +- βœ… Maintain type safety: TypeScript strict mode +- βœ… Validate ALL MCP tool parameters with AJV +- βœ… Ensure AsyncLock mutex for schema cache writes +- βœ… Preserve Deno sandbox security + +**CRITICAL:** DO NOT USE SUB-AGENTS FOR FIXES - Direct implementation only + +--- + +## βœ… VALIDATE + +**Mandatory quality checks:** + +```bash +npm run lint && npm run typecheck && npm run build && npm test +``` + +**NO CORNER CUTTING. FIX IT RIGHT.** + +--- + +## ⚑ QUALITY CIRCUIT TRIGGER + +**Automated quality enforcement after fix completes:** + +1. **CRITICAL:** Run `npm run lint && npm run typecheck` +2. If TypeScript/ESLint errors β†’ Fix immediately (ZERO TOLERANCE) +3. Run test suite to verify fix: `npm test` +4. **CRITICAL** invoke automatically `/code-review` on the fixes if >LOW issues were fixed +**Safety Limit:** Max 5 circuit iterations to prevent infinite loops \ No newline at end of file diff --git a/.agent/workflows/speckit.analyze.md b/.agent/workflows/speckit.analyze.md new file mode 100644 index 0000000..98b04b0 --- /dev/null +++ b/.agent/workflows/speckit.analyze.md @@ -0,0 +1,184 @@ +--- +description: Perform a non-destructive cross-artifact consistency and quality analysis across spec.md, plan.md, and tasks.md after task generation. +--- + +## User Input + +```text +$ARGUMENTS +``` + +You **MUST** consider the user input before proceeding (if not empty). + +## Goal + +Identify inconsistencies, duplications, ambiguities, and underspecified items across the three core artifacts (`spec.md`, `plan.md`, `tasks.md`) before implementation. This command MUST run only after `/speckit.tasks` has successfully produced a complete `tasks.md`. + +## Operating Constraints + +**STRICTLY READ-ONLY**: Do **not** modify any files. Output a structured analysis report. Offer an optional remediation plan (user must explicitly approve before any follow-up editing commands would be invoked manually). + +**Constitution Authority**: The project constitution (`.specify/memory/constitution.md`) is **non-negotiable** within this analysis scope. Constitution conflicts are automatically CRITICAL and require adjustment of the spec, plan, or tasksβ€”not dilution, reinterpretation, or silent ignoring of the principle. If a principle itself needs to change, that must occur in a separate, explicit constitution update outside `/speckit.analyze`. + +## Execution Steps + +### 1. Initialize Analysis Context + +Run `.specify/scripts/bash/check-prerequisites.sh --json --require-tasks --include-tasks` once from repo root and parse JSON for FEATURE_DIR and AVAILABLE_DOCS. Derive absolute paths: + +- SPEC = FEATURE_DIR/spec.md +- PLAN = FEATURE_DIR/plan.md +- TASKS = FEATURE_DIR/tasks.md + +Abort with an error message if any required file is missing (instruct the user to run missing prerequisite command). +For single quotes in args like "I'm Groot", use escape syntax: e.g 'I'\''m Groot' (or double-quote if possible: "I'm Groot"). + +### 2. Load Artifacts (Progressive Disclosure) + +Load only the minimal necessary context from each artifact: + +**From spec.md:** + +- Overview/Context +- Functional Requirements +- Non-Functional Requirements +- User Stories +- Edge Cases (if present) + +**From plan.md:** + +- Architecture/stack choices +- Data Model references +- Phases +- Technical constraints + +**From tasks.md:** + +- Task IDs +- Descriptions +- Phase grouping +- Parallel markers [P] +- Referenced file paths + +**From constitution:** + +- Load `.specify/memory/constitution.md` for principle validation + +### 3. Build Semantic Models + +Create internal representations (do not include raw artifacts in output): + +- **Requirements inventory**: Each functional + non-functional requirement with a stable key (derive slug based on imperative phrase; e.g., "User can upload file" β†’ `user-can-upload-file`) +- **User story/action inventory**: Discrete user actions with acceptance criteria +- **Task coverage mapping**: Map each task to one or more requirements or stories (inference by keyword / explicit reference patterns like IDs or key phrases) +- **Constitution rule set**: Extract principle names and MUST/SHOULD normative statements + +### 4. Detection Passes (Token-Efficient Analysis) + +Focus on high-signal findings. Limit to 50 findings total; aggregate remainder in overflow summary. + +#### A. Duplication Detection + +- Identify near-duplicate requirements +- Mark lower-quality phrasing for consolidation + +#### B. Ambiguity Detection + +- Flag vague adjectives (fast, scalable, secure, intuitive, robust) lacking measurable criteria +- Flag unresolved placeholders (TODO, TKTK, ???, ``, etc.) + +#### C. Underspecification + +- Requirements with verbs but missing object or measurable outcome +- User stories missing acceptance criteria alignment +- Tasks referencing files or components not defined in spec/plan + +#### D. Constitution Alignment + +- Any requirement or plan element conflicting with a MUST principle +- Missing mandated sections or quality gates from constitution + +#### E. Coverage Gaps + +- Requirements with zero associated tasks +- Tasks with no mapped requirement/story +- Non-functional requirements not reflected in tasks (e.g., performance, security) + +#### F. Inconsistency + +- Terminology drift (same concept named differently across files) +- Data entities referenced in plan but absent in spec (or vice versa) +- Task ordering contradictions (e.g., integration tasks before foundational setup tasks without dependency note) +- Conflicting requirements (e.g., one requires Next.js while other specifies Vue) + +### 5. Severity Assignment + +Use this heuristic to prioritize findings: + +- **CRITICAL**: Violates constitution MUST, missing core spec artifact, or requirement with zero coverage that blocks baseline functionality +- **HIGH**: Duplicate or conflicting requirement, ambiguous security/performance attribute, untestable acceptance criterion +- **MEDIUM**: Terminology drift, missing non-functional task coverage, underspecified edge case +- **LOW**: Style/wording improvements, minor redundancy not affecting execution order + +### 6. Produce Compact Analysis Report + +Output a Markdown report (no file writes) with the following structure: + +## Specification Analysis Report + +| ID | Category | Severity | Location(s) | Summary | Recommendation | +|----|----------|----------|-------------|---------|----------------| +| A1 | Duplication | HIGH | spec.md:L120-134 | Two similar requirements ... | Merge phrasing; keep clearer version | + +(Add one row per finding; generate stable IDs prefixed by category initial.) + +**Coverage Summary Table:** + +| Requirement Key | Has Task? | Task IDs | Notes | +|-----------------|-----------|----------|-------| + +**Constitution Alignment Issues:** (if any) + +**Unmapped Tasks:** (if any) + +**Metrics:** + +- Total Requirements +- Total Tasks +- Coverage % (requirements with >=1 task) +- Ambiguity Count +- Duplication Count +- Critical Issues Count + +### 7. Provide Next Actions + +At end of report, output a concise Next Actions block: + +- If CRITICAL issues exist: Recommend resolving before `/speckit.implement` +- If only LOW/MEDIUM: User may proceed, but provide improvement suggestions +- Provide explicit command suggestions: e.g., "Run /speckit.specify with refinement", "Run /speckit.plan to adjust architecture", "Manually edit tasks.md to add coverage for 'performance-metrics'" + +### 8. Offer Remediation + +Ask the user: "Would you like me to suggest concrete remediation edits for the top N issues?" (Do NOT apply them automatically.) + +## Operating Principles + +### Context Efficiency + +- **Minimal high-signal tokens**: Focus on actionable findings, not exhaustive documentation +- **Progressive disclosure**: Load artifacts incrementally; don't dump all content into analysis +- **Token-efficient output**: Limit findings table to 50 rows; summarize overflow +- **Deterministic results**: Rerunning without changes should produce consistent IDs and counts + +### Analysis Guidelines + +- **NEVER modify files** (this is read-only analysis) +- **NEVER hallucinate missing sections** (if absent, report them accurately) +- **Prioritize constitution violations** (these are always CRITICAL) +- **Use examples over exhaustive rules** (cite specific instances, not generic patterns) +- **Report zero issues gracefully** (emit success report with coverage statistics) + +## Context + +$ARGUMENTS diff --git a/.agent/workflows/speckit.checklist.md b/.agent/workflows/speckit.checklist.md new file mode 100644 index 0000000..970e6c9 --- /dev/null +++ b/.agent/workflows/speckit.checklist.md @@ -0,0 +1,294 @@ +--- +description: Generate a custom checklist for the current feature based on user requirements. +--- + +## Checklist Purpose: "Unit Tests for English" + +**CRITICAL CONCEPT**: Checklists are **UNIT TESTS FOR REQUIREMENTS WRITING** - they validate the quality, clarity, and completeness of requirements in a given domain. + +**NOT for verification/testing**: + +- ❌ NOT "Verify the button clicks correctly" +- ❌ NOT "Test error handling works" +- ❌ NOT "Confirm the API returns 200" +- ❌ NOT checking if code/implementation matches the spec + +**FOR requirements quality validation**: + +- βœ… "Are visual hierarchy requirements defined for all card types?" (completeness) +- βœ… "Is 'prominent display' quantified with specific sizing/positioning?" (clarity) +- βœ… "Are hover state requirements consistent across all interactive elements?" (consistency) +- βœ… "Are accessibility requirements defined for keyboard navigation?" (coverage) +- βœ… "Does the spec define what happens when logo image fails to load?" (edge cases) + +**Metaphor**: If your spec is code written in English, the checklist is its unit test suite. You're testing whether the requirements are well-written, complete, unambiguous, and ready for implementation - NOT whether the implementation works. + +## User Input + +```text +$ARGUMENTS +``` + +You **MUST** consider the user input before proceeding (if not empty). + +## Execution Steps + +1. **Setup**: Run `.specify/scripts/bash/check-prerequisites.sh --json` from repo root and parse JSON for FEATURE_DIR and AVAILABLE_DOCS list. + - All file paths must be absolute. + - For single quotes in args like "I'm Groot", use escape syntax: e.g 'I'\''m Groot' (or double-quote if possible: "I'm Groot"). + +2. **Clarify intent (dynamic)**: Derive up to THREE initial contextual clarifying questions (no pre-baked catalog). They MUST: + - Be generated from the user's phrasing + extracted signals from spec/plan/tasks + - Only ask about information that materially changes checklist content + - Be skipped individually if already unambiguous in `$ARGUMENTS` + - Prefer precision over breadth + + Generation algorithm: + 1. Extract signals: feature domain keywords (e.g., auth, latency, UX, API), risk indicators ("critical", "must", "compliance"), stakeholder hints ("QA", "review", "security team"), and explicit deliverables ("a11y", "rollback", "contracts"). + 2. Cluster signals into candidate focus areas (max 4) ranked by relevance. + 3. Identify probable audience & timing (author, reviewer, QA, release) if not explicit. + 4. Detect missing dimensions: scope breadth, depth/rigor, risk emphasis, exclusion boundaries, measurable acceptance criteria. + 5. Formulate questions chosen from these archetypes: + - Scope refinement (e.g., "Should this include integration touchpoints with X and Y or stay limited to local module correctness?") + - Risk prioritization (e.g., "Which of these potential risk areas should receive mandatory gating checks?") + - Depth calibration (e.g., "Is this a lightweight pre-commit sanity list or a formal release gate?") + - Audience framing (e.g., "Will this be used by the author only or peers during PR review?") + - Boundary exclusion (e.g., "Should we explicitly exclude performance tuning items this round?") + - Scenario class gap (e.g., "No recovery flows detectedβ€”are rollback / partial failure paths in scope?") + + Question formatting rules: + - If presenting options, generate a compact table with columns: Option | Candidate | Why It Matters + - Limit to A–E options maximum; omit table if a free-form answer is clearer + - Never ask the user to restate what they already said + - Avoid speculative categories (no hallucination). If uncertain, ask explicitly: "Confirm whether X belongs in scope." + + Defaults when interaction impossible: + - Depth: Standard + - Audience: Reviewer (PR) if code-related; Author otherwise + - Focus: Top 2 relevance clusters + + Output the questions (label Q1/Q2/Q3). After answers: if β‰₯2 scenario classes (Alternate / Exception / Recovery / Non-Functional domain) remain unclear, you MAY ask up to TWO more targeted follow‑ups (Q4/Q5) with a one-line justification each (e.g., "Unresolved recovery path risk"). Do not exceed five total questions. Skip escalation if user explicitly declines more. + +3. **Understand user request**: Combine `$ARGUMENTS` + clarifying answers: + - Derive checklist theme (e.g., security, review, deploy, ux) + - Consolidate explicit must-have items mentioned by user + - Map focus selections to category scaffolding + - Infer any missing context from spec/plan/tasks (do NOT hallucinate) + +4. **Load feature context**: Read from FEATURE_DIR: + - spec.md: Feature requirements and scope + - plan.md (if exists): Technical details, dependencies + - tasks.md (if exists): Implementation tasks + + **Context Loading Strategy**: + - Load only necessary portions relevant to active focus areas (avoid full-file dumping) + - Prefer summarizing long sections into concise scenario/requirement bullets + - Use progressive disclosure: add follow-on retrieval only if gaps detected + - If source docs are large, generate interim summary items instead of embedding raw text + +5. **Generate checklist** - Create "Unit Tests for Requirements": + - Create `FEATURE_DIR/checklists/` directory if it doesn't exist + - Generate unique checklist filename: + - Use short, descriptive name based on domain (e.g., `ux.md`, `api.md`, `security.md`) + - Format: `[domain].md` + - If file exists, append to existing file + - Number items sequentially starting from CHK001 + - Each `/speckit.checklist` run creates a NEW file (never overwrites existing checklists) + + **CORE PRINCIPLE - Test the Requirements, Not the Implementation**: + Every checklist item MUST evaluate the REQUIREMENTS THEMSELVES for: + - **Completeness**: Are all necessary requirements present? + - **Clarity**: Are requirements unambiguous and specific? + - **Consistency**: Do requirements align with each other? + - **Measurability**: Can requirements be objectively verified? + - **Coverage**: Are all scenarios/edge cases addressed? + + **Category Structure** - Group items by requirement quality dimensions: + - **Requirement Completeness** (Are all necessary requirements documented?) + - **Requirement Clarity** (Are requirements specific and unambiguous?) + - **Requirement Consistency** (Do requirements align without conflicts?) + - **Acceptance Criteria Quality** (Are success criteria measurable?) + - **Scenario Coverage** (Are all flows/cases addressed?) + - **Edge Case Coverage** (Are boundary conditions defined?) + - **Non-Functional Requirements** (Performance, Security, Accessibility, etc. - are they specified?) + - **Dependencies & Assumptions** (Are they documented and validated?) + - **Ambiguities & Conflicts** (What needs clarification?) + + **HOW TO WRITE CHECKLIST ITEMS - "Unit Tests for English"**: + + ❌ **WRONG** (Testing implementation): + - "Verify landing page displays 3 episode cards" + - "Test hover states work on desktop" + - "Confirm logo click navigates home" + + βœ… **CORRECT** (Testing requirements quality): + - "Are the exact number and layout of featured episodes specified?" [Completeness] + - "Is 'prominent display' quantified with specific sizing/positioning?" [Clarity] + - "Are hover state requirements consistent across all interactive elements?" [Consistency] + - "Are keyboard navigation requirements defined for all interactive UI?" [Coverage] + - "Is the fallback behavior specified when logo image fails to load?" [Edge Cases] + - "Are loading states defined for asynchronous episode data?" [Completeness] + - "Does the spec define visual hierarchy for competing UI elements?" [Clarity] + + **ITEM STRUCTURE**: + Each item should follow this pattern: + - Question format asking about requirement quality + - Focus on what's WRITTEN (or not written) in the spec/plan + - Include quality dimension in brackets [Completeness/Clarity/Consistency/etc.] + - Reference spec section `[Spec Β§X.Y]` when checking existing requirements + - Use `[Gap]` marker when checking for missing requirements + + **EXAMPLES BY QUALITY DIMENSION**: + + Completeness: + - "Are error handling requirements defined for all API failure modes? [Gap]" + - "Are accessibility requirements specified for all interactive elements? [Completeness]" + - "Are mobile breakpoint requirements defined for responsive layouts? [Gap]" + + Clarity: + - "Is 'fast loading' quantified with specific timing thresholds? [Clarity, Spec Β§NFR-2]" + - "Are 'related episodes' selection criteria explicitly defined? [Clarity, Spec Β§FR-5]" + - "Is 'prominent' defined with measurable visual properties? [Ambiguity, Spec Β§FR-4]" + + Consistency: + - "Do navigation requirements align across all pages? [Consistency, Spec Β§FR-10]" + - "Are card component requirements consistent between landing and detail pages? [Consistency]" + + Coverage: + - "Are requirements defined for zero-state scenarios (no episodes)? [Coverage, Edge Case]" + - "Are concurrent user interaction scenarios addressed? [Coverage, Gap]" + - "Are requirements specified for partial data loading failures? [Coverage, Exception Flow]" + + Measurability: + - "Are visual hierarchy requirements measurable/testable? [Acceptance Criteria, Spec Β§FR-1]" + - "Can 'balanced visual weight' be objectively verified? [Measurability, Spec Β§FR-2]" + + **Scenario Classification & Coverage** (Requirements Quality Focus): + - Check if requirements exist for: Primary, Alternate, Exception/Error, Recovery, Non-Functional scenarios + - For each scenario class, ask: "Are [scenario type] requirements complete, clear, and consistent?" + - If scenario class missing: "Are [scenario type] requirements intentionally excluded or missing? [Gap]" + - Include resilience/rollback when state mutation occurs: "Are rollback requirements defined for migration failures? [Gap]" + + **Traceability Requirements**: + - MINIMUM: β‰₯80% of items MUST include at least one traceability reference + - Each item should reference: spec section `[Spec Β§X.Y]`, or use markers: `[Gap]`, `[Ambiguity]`, `[Conflict]`, `[Assumption]` + - If no ID system exists: "Is a requirement & acceptance criteria ID scheme established? [Traceability]" + + **Surface & Resolve Issues** (Requirements Quality Problems): + Ask questions about the requirements themselves: + - Ambiguities: "Is the term 'fast' quantified with specific metrics? [Ambiguity, Spec Β§NFR-1]" + - Conflicts: "Do navigation requirements conflict between Β§FR-10 and Β§FR-10a? [Conflict]" + - Assumptions: "Is the assumption of 'always available podcast API' validated? [Assumption]" + - Dependencies: "Are external podcast API requirements documented? [Dependency, Gap]" + - Missing definitions: "Is 'visual hierarchy' defined with measurable criteria? [Gap]" + + **Content Consolidation**: + - Soft cap: If raw candidate items > 40, prioritize by risk/impact + - Merge near-duplicates checking the same requirement aspect + - If >5 low-impact edge cases, create one item: "Are edge cases X, Y, Z addressed in requirements? [Coverage]" + + **🚫 ABSOLUTELY PROHIBITED** - These make it an implementation test, not a requirements test: + - ❌ Any item starting with "Verify", "Test", "Confirm", "Check" + implementation behavior + - ❌ References to code execution, user actions, system behavior + - ❌ "Displays correctly", "works properly", "functions as expected" + - ❌ "Click", "navigate", "render", "load", "execute" + - ❌ Test cases, test plans, QA procedures + - ❌ Implementation details (frameworks, APIs, algorithms) + + **βœ… REQUIRED PATTERNS** - These test requirements quality: + - βœ… "Are [requirement type] defined/specified/documented for [scenario]?" + - βœ… "Is [vague term] quantified/clarified with specific criteria?" + - βœ… "Are requirements consistent between [section A] and [section B]?" + - βœ… "Can [requirement] be objectively measured/verified?" + - βœ… "Are [edge cases/scenarios] addressed in requirements?" + - βœ… "Does the spec define [missing aspect]?" + +6. **Structure Reference**: Generate the checklist following the canonical template in `.specify/templates/checklist-template.md` for title, meta section, category headings, and ID formatting. If template is unavailable, use: H1 title, purpose/created meta lines, `##` category sections containing `- [ ] CHK### ` lines with globally incrementing IDs starting at CHK001. + +7. **Report**: Output full path to created checklist, item count, and remind user that each run creates a new file. Summarize: + - Focus areas selected + - Depth level + - Actor/timing + - Any explicit user-specified must-have items incorporated + +**Important**: Each `/speckit.checklist` command invocation creates a checklist file using short, descriptive names unless file already exists. This allows: + +- Multiple checklists of different types (e.g., `ux.md`, `test.md`, `security.md`) +- Simple, memorable filenames that indicate checklist purpose +- Easy identification and navigation in the `checklists/` folder + +To avoid clutter, use descriptive types and clean up obsolete checklists when done. + +## Example Checklist Types & Sample Items + +**UX Requirements Quality:** `ux.md` + +Sample items (testing the requirements, NOT the implementation): + +- "Are visual hierarchy requirements defined with measurable criteria? [Clarity, Spec Β§FR-1]" +- "Is the number and positioning of UI elements explicitly specified? [Completeness, Spec Β§FR-1]" +- "Are interaction state requirements (hover, focus, active) consistently defined? [Consistency]" +- "Are accessibility requirements specified for all interactive elements? [Coverage, Gap]" +- "Is fallback behavior defined when images fail to load? [Edge Case, Gap]" +- "Can 'prominent display' be objectively measured? [Measurability, Spec Β§FR-4]" + +**API Requirements Quality:** `api.md` + +Sample items: + +- "Are error response formats specified for all failure scenarios? [Completeness]" +- "Are rate limiting requirements quantified with specific thresholds? [Clarity]" +- "Are authentication requirements consistent across all endpoints? [Consistency]" +- "Are retry/timeout requirements defined for external dependencies? [Coverage, Gap]" +- "Is versioning strategy documented in requirements? [Gap]" + +**Performance Requirements Quality:** `performance.md` + +Sample items: + +- "Are performance requirements quantified with specific metrics? [Clarity]" +- "Are performance targets defined for all critical user journeys? [Coverage]" +- "Are performance requirements under different load conditions specified? [Completeness]" +- "Can performance requirements be objectively measured? [Measurability]" +- "Are degradation requirements defined for high-load scenarios? [Edge Case, Gap]" + +**Security Requirements Quality:** `security.md` + +Sample items: + +- "Are authentication requirements specified for all protected resources? [Coverage]" +- "Are data protection requirements defined for sensitive information? [Completeness]" +- "Is the threat model documented and requirements aligned to it? [Traceability]" +- "Are security requirements consistent with compliance obligations? [Consistency]" +- "Are security failure/breach response requirements defined? [Gap, Exception Flow]" + +## Anti-Examples: What NOT To Do + +**❌ WRONG - These test implementation, not requirements:** + +```markdown +- [ ] CHK001 - Verify landing page displays 3 episode cards [Spec Β§FR-001] +- [ ] CHK002 - Test hover states work correctly on desktop [Spec Β§FR-003] +- [ ] CHK003 - Confirm logo click navigates to home page [Spec Β§FR-010] +- [ ] CHK004 - Check that related episodes section shows 3-5 items [Spec Β§FR-005] +``` + +**βœ… CORRECT - These test requirements quality:** + +```markdown +- [ ] CHK001 - Are the number and layout of featured episodes explicitly specified? [Completeness, Spec Β§FR-001] +- [ ] CHK002 - Are hover state requirements consistently defined for all interactive elements? [Consistency, Spec Β§FR-003] +- [ ] CHK003 - Are navigation requirements clear for all clickable brand elements? [Clarity, Spec Β§FR-010] +- [ ] CHK004 - Is the selection criteria for related episodes documented? [Gap, Spec Β§FR-005] +- [ ] CHK005 - Are loading state requirements defined for asynchronous episode data? [Gap] +- [ ] CHK006 - Can "visual hierarchy" requirements be objectively measured? [Measurability, Spec Β§FR-001] +``` + +**Key Differences:** + +- Wrong: Tests if the system works correctly +- Correct: Tests if the requirements are written correctly +- Wrong: Verification of behavior +- Correct: Validation of requirement quality +- Wrong: "Does it do X?" +- Correct: "Is X clearly specified?" diff --git a/.agent/workflows/speckit.clarify.md b/.agent/workflows/speckit.clarify.md new file mode 100644 index 0000000..8ff62c3 --- /dev/null +++ b/.agent/workflows/speckit.clarify.md @@ -0,0 +1,177 @@ +--- +description: Identify underspecified areas in the current feature spec by asking up to 5 highly targeted clarification questions and encoding answers back into the spec. +--- + +## User Input + +```text +$ARGUMENTS +``` + +You **MUST** consider the user input before proceeding (if not empty). + +## Outline + +Goal: Detect and reduce ambiguity or missing decision points in the active feature specification and record the clarifications directly in the spec file. + +Note: This clarification workflow is expected to run (and be completed) BEFORE invoking `/speckit.plan`. If the user explicitly states they are skipping clarification (e.g., exploratory spike), you may proceed, but must warn that downstream rework risk increases. + +Execution steps: + +1. Run `.specify/scripts/bash/check-prerequisites.sh --json --paths-only` from repo root **once** (combined `--json --paths-only` mode / `-Json -PathsOnly`). Parse minimal JSON payload fields: + - `FEATURE_DIR` + - `FEATURE_SPEC` + - (Optionally capture `IMPL_PLAN`, `TASKS` for future chained flows.) + - If JSON parsing fails, abort and instruct user to re-run `/speckit.specify` or verify feature branch environment. + - For single quotes in args like "I'm Groot", use escape syntax: e.g 'I'\''m Groot' (or double-quote if possible: "I'm Groot"). + +2. Load the current spec file. Perform a structured ambiguity & coverage scan using this taxonomy. For each category, mark status: Clear / Partial / Missing. Produce an internal coverage map used for prioritization (do not output raw map unless no questions will be asked). + + Functional Scope & Behavior: + - Core user goals & success criteria + - Explicit out-of-scope declarations + - User roles / personas differentiation + + Domain & Data Model: + - Entities, attributes, relationships + - Identity & uniqueness rules + - Lifecycle/state transitions + - Data volume / scale assumptions + + Interaction & UX Flow: + - Critical user journeys / sequences + - Error/empty/loading states + - Accessibility or localization notes + + Non-Functional Quality Attributes: + - Performance (latency, throughput targets) + - Scalability (horizontal/vertical, limits) + - Reliability & availability (uptime, recovery expectations) + - Observability (logging, metrics, tracing signals) + - Security & privacy (authN/Z, data protection, threat assumptions) + - Compliance / regulatory constraints (if any) + + Integration & External Dependencies: + - External services/APIs and failure modes + - Data import/export formats + - Protocol/versioning assumptions + + Edge Cases & Failure Handling: + - Negative scenarios + - Rate limiting / throttling + - Conflict resolution (e.g., concurrent edits) + + Constraints & Tradeoffs: + - Technical constraints (language, storage, hosting) + - Explicit tradeoffs or rejected alternatives + + Terminology & Consistency: + - Canonical glossary terms + - Avoided synonyms / deprecated terms + + Completion Signals: + - Acceptance criteria testability + - Measurable Definition of Done style indicators + + Misc / Placeholders: + - TODO markers / unresolved decisions + - Ambiguous adjectives ("robust", "intuitive") lacking quantification + + For each category with Partial or Missing status, add a candidate question opportunity unless: + - Clarification would not materially change implementation or validation strategy + - Information is better deferred to planning phase (note internally) + +3. Generate (internally) a prioritized queue of candidate clarification questions (maximum 5). Do NOT output them all at once. Apply these constraints: + - Maximum of 10 total questions across the whole session. + - Each question must be answerable with EITHER: + - A short multiple‑choice selection (2–5 distinct, mutually exclusive options), OR + - A one-word / short‑phrase answer (explicitly constrain: "Answer in <=5 words"). + - Only include questions whose answers materially impact architecture, data modeling, task decomposition, test design, UX behavior, operational readiness, or compliance validation. + - Ensure category coverage balance: attempt to cover the highest impact unresolved categories first; avoid asking two low-impact questions when a single high-impact area (e.g., security posture) is unresolved. + - Exclude questions already answered, trivial stylistic preferences, or plan-level execution details (unless blocking correctness). + - Favor clarifications that reduce downstream rework risk or prevent misaligned acceptance tests. + - If more than 5 categories remain unresolved, select the top 5 by (Impact * Uncertainty) heuristic. + +4. Sequential questioning loop (interactive): + - Present EXACTLY ONE question at a time. + - For multiple‑choice questions: + - **Analyze all options** and determine the **most suitable option** based on: + - Best practices for the project type + - Common patterns in similar implementations + - Risk reduction (security, performance, maintainability) + - Alignment with any explicit project goals or constraints visible in the spec + - Present your **recommended option prominently** at the top with clear reasoning (1-2 sentences explaining why this is the best choice). + - Format as: `**Recommended:** Option [X] - ` + - Then render all options as a Markdown table: + + | Option | Description | + |--------|-------------| + | A |