Digital-Threads · Shahinyanm · May 7, 2026 · May 6, 2026 · May 6, 2026 · May 6, 2026
diff --git a/.docs/plans/2026-05-06-v0.1.4-hardening.md b/.docs/plans/2026-05-06-v0.1.4-hardening.md
@@ -0,0 +1,89 @@
+# Epic A — v0.1.4 hardening
+
+**Date:** 2026-05-06
+**Branch:** `claude/youthful-shaw-b96d78`
+**Target release:** `0.1.4` (backwards-compatible patch)
+**Bd epic:** see `bd list --type epic` (assigned id at runtime)
+
+## Goal
+
+Ship a backwards-compatible patch that closes the most acute correctness, robustness, and OSS-hygiene gaps identified in the 2026-05-06 audit, **without breaking the public CLI/MCP contract**. Anything that requires a breaking change is deferred to Epic B (v0.2.0).
+
+## Success criteria
+
+1. `cargo test --workspace --all-targets` green on `ubuntu-latest`, `macos-latest`, `windows-latest`.
+2. `cargo audit` runs in CI and is clean (or vulnerabilities are accepted with documented reason).
+3. `cargo clippy --workspace --all-targets -- -D warnings` clean.
+4. `cargo doc --workspace --no-deps` clean with `RUSTDOCFLAGS=-D warnings`.
+5. New CI job pins MSRV (currently `1.83`) and verifies build.
+6. `CHANGELOG.md` exists and documents `0.1.4` entry.
+7. No removed/renamed CLI flags. No removed MCP tools or required parameters.
+8. Branch `claude/youthful-shaw-b96d78` pushed; PR opened against `main`.
+
+## Out of scope (deferred)
+
+- Incremental indexing / pack-cache fix (Epic B — perf)
+- MCP error contract redesign (Epic B — breaking)
+- `--project-dir` argument for MCP (Epic B)
+- Migrations framework (Epic B — coupled with incremental indexing)
+- Few-shot prompting + eval datasets (Epic C — quality)
+- `task-journal doctor`, `migrate-project` (Epic C — DX)
+
+## Tasks (11)
+
+Each task is one atomic commit. Test-first when behavior changes; doc/CI-only tasks may skip the failing-test step.
+
+| # | Task | Touches | Test? | Notes |
+|---|------|---------|-------|-------|
+| A1 | HTTP timeout for `AnthropicClassifier` | `classifier/http.rs` | yes (mockito slow-server) | 15s connect+read timeout. Hardcoded — env-var override deferred. |
+| A2 | Graceful skip of malformed JSONL lines in `rebuild_state` | `db.rs` | yes (jsonl with bad line) | Log a `tracing::warn!` and continue; total parsed count returned. |
+| A3 | Classifier model overridable via env var | `classifier/http.rs`, `classifier/cli.rs` | yes (env unset → default; env set → override) | `TJ_CLASSIFIER_MODEL`; default unchanged. |
+| A4 | Extend task_id from 6 → 10 characters | `crates/tj-mcp/src/main.rs`, `crates/tj-cli/src/main.rs` | yes (collision-free over 10k synthetic ids) | Old 6-char ids remain valid (string compare). |
+| A5 | Remove `stub: bool` from MCP responses | `crates/tj-mcp/src/main.rs`, smoke tests | yes (smoke test asserts no `stub` field) | Field removal — but no client read it; documented in CHANGELOG. |
+| A6 | Centralize `SCHEMA_VERSION` const | `tj-core/src/lib.rs`, `pack.rs`, `tj-mcp/src/main.rs` | yes (single source) | `pub const SCHEMA_VERSION: &str = "1.0";` |
+| A7 | `CHANGELOG.md` with Keep-a-Changelog format | new file | n/a | Backfill `0.1.0`–`0.1.3` from `git log`. |
+| A8 | `cargo-audit` job in CI | `.github/workflows/ci.yml` | n/a | Non-blocking initially; flips to blocking once green. |
+| A9 | MSRV job in CI (`rust-version` = 1.83) | `.github/workflows/ci.yml` | n/a | Uses `dtolnay/rust-toolchain@1.83`. |
+| A10 | `.editorconfig` | new file | n/a | LF, UTF-8, 4-space rust, 2-space yaml. |
+| A11 | File-lock on JSONL append | `tj-core/src/storage.rs`, `Cargo.toml` | yes (two-writer race test) | Crate: `fd-lock` (cross-platform). Blocking lock. |
+
+## Sequencing
+
+```
+A6 ──┐
+A1 ──┼─→ A7 (CHANGELOG references all done work)
+A2 ──┤
+A3 ──┤
+A4 ──┤
+A5 ──┤
+A10 ─┤
+A8 ──┤
+A9 ──┘
+                A11 last (fd-lock dep + race test)
+```
+
+A11 last because it adds a runtime dependency and a flaky-prone test; everything else lands first so green CI is the baseline before introducing the lock.
+
+## Risks
+
+- **A4 task_id length change:** new ids longer; nothing reads fixed-width. Verified by smart_read of CLI/MCP code paths.
+- **A5 `stub` removal:** technically a schema change, but `stub` was always false post-Phase-1. Documented as non-breaking in CHANGELOG; if any downstream tool actually reads it, we revert in 0.1.5.
+- **A11 fd-lock on Windows:** `fd-lock` uses `LockFileEx` on Windows; behavior differs from Linux `flock`. Test must cover both.
+- **A2 swallowing real corruption:** mitigation — log at `warn!` level with line number and parse error.
+
+## Verification (per task)
+
+1. `cargo fmt --all --check`
+2. `cargo clippy --workspace --all-targets -- -D warnings`
+3. `cargo test --workspace --all-targets` (specific test for the touched module)
+4. `git diff --stat` reviewed (no unintended line-ending or whitespace flips)
+5. Commit with conventional-commit prefix (`fix:`, `chore:`, `docs:`, `ci:`, `feat:`)
+6. `bd update <id> --status closed --reason "<one-line>"`
+
+## Final verification (epic-level)
+
+- `cargo test --workspace --all-targets` green
+- `cargo audit` clean
+- `bd list --parent <epic-id> --status open` returns empty
+- `git log --oneline 8c49785..HEAD` matches the 11 tasks 1:1
+- `gh pr create` opened against `main` with the CHANGELOG entry as body
diff --git a/.docs/plans/2026-05-06-v0.2.0-epic-c-pr-body.md b/.docs/plans/2026-05-06-v0.2.0-epic-c-pr-body.md
@@ -0,0 +1,58 @@
+## Summary
+
+Epic C — quality / DX / community polish. **8 atomic commits** on `claude/v0.2.0-epic-c`, built off `claude/v0.2.0-epic-b` HEAD.
+
+> **Merge order:** epic A → main, then epic B (rebased on main), then this branch (rebased on main).
+
+Plan: [`.docs/plans/2026-05-06-v0.2.0-epic-c-quality.md`](./.docs/plans/2026-05-06-v0.2.0-epic-c-quality.md)
+
+### What changed
+
+**Classifier quality**
+- `feat(classifier)` — six few-shot Input/Output examples in the prompt covering the harder boundary calls (hypothesis vs finding, finding vs evidence, decision vs hypothesis). Prompt-budget guard still passes.
+- `test(classifier)` — 30-row labeled eval fixture + opt-in accuracy gate (`TJ_CLASSIFIER_EVAL=on`). Default mode runs hermetic shape tests; opt-in mode calls `ClaudeCliClassifier::default()` and asserts ≥ 0.70 accuracy. Floor will ratchet up after 100+ dogfood examples.
+
+**User-facing DX**
+- `feat(cli)` — `task-journal doctor` self-check command with human + `--json` output. Reports claude-on-PATH, data-dir writability, known projects, schema migrations.
+- `feat(cli)` — `task-journal migrate-project --from PATH --to PATH [--force]`. Renames JSONL/SQLite/metrics from old project_hash to new; UPDATEs `tasks.project_hash` and `index_state.project_hash` columns in SQLite.
+- `feat(export)` — `export --format html` produces a self-contained timeline page (inline CSS, no external assets, dark-mode aware via `prefers-color-scheme`).
+
+**OSS / coverage / Windows**
+- `chore` — `CONTRIBUTING.md`, `CODE_OF_CONDUCT.md` (Contributor Covenant 2.1 ref), three `.github/ISSUE_TEMPLATE/*`, `.github/PULL_REQUEST_TEMPLATE.md`, README links.
+- `ci` — `cargo-llvm-cov` job + Codecov upload + README badge. Non-blocking initially.
+- `test(classifier)` — cross-platform fake-claude shim (`.sh`/`cat` on Unix, `.cmd`/`type` on Windows). The two `ClaudeCliClassifier` tests now run on all three CI matrix OS instead of `cfg(unix)` only.
+
+### Verification
+
+- `cargo fmt --all -- --check` ✅
+- `cargo clippy --workspace --all-targets -- -D warnings` ✅
+- `cargo test --workspace --all-targets` ✅ — **202 tests** (was 193 from epic B; +9 added by this PR)
+- `cargo bench --workspace --no-run` ✅
+
+### New CLI surface
+
+| Command | Purpose |
+|---------|---------|
+| `task-journal doctor [--json]` | Diagnostic check; non-zero exit on issues |
+| `task-journal migrate-project --from PATH --to PATH [--force]` | Re-key on-disk data when project moves |
+| `task-journal export --format html [--task ID]` | Self-contained HTML timeline |
+
+### New env vars
+
+| Var | Effect |
+|-----|--------|
+| `TJ_CLASSIFIER_EVAL=on` | Enables the real-classifier accuracy run in `cargo test`. Default OFF — CI stays hermetic. |
+
+### Test plan
+
+- [ ] Branch CI green on three OS for `test`, `msrv`, `audit`, `benches-compile`, `coverage` (new).
+- [ ] Try `task-journal doctor` on a clean VM — confirms claude-binary detection and dir-writability checks.
+- [ ] Move a project on disk, run `migrate-project`, confirm `task_pack` works in the new location.
+- [ ] `task-journal export --format html --task tj-X > timeline.html` and open in browser; verify dark mode + no broken layout.
+- [ ] (Optional, manual) `TJ_CLASSIFIER_EVAL=on cargo test classifier_meets_accuracy_floor` against the real `claude` CLI; record baseline accuracy.
+
+### After this lands
+
+`v0.2.0` final tag. No further code changes expected — the dogfood window from `0.2.0-rc.1` already exercised epic B; this epic is additive and behind-the-scenes for almost every existing user.
+
+🤖 Generated with [Claude Code](https://claude.com/claude-code)
diff --git a/.docs/plans/2026-05-06-v0.2.0-epic-c-quality.md b/.docs/plans/2026-05-06-v0.2.0-epic-c-quality.md
@@ -0,0 +1,84 @@
+# Epic C — v0.2.0: quality, DX, community polish
+
+**Date:** 2026-05-06
+**Branch:** `claude/v0.2.0-epic-c` (off `claude/v0.2.0-epic-b` HEAD)
+**Target release:** `0.2.0` final (after epic B's `rc.1` is dogfooded and this PR merges)
+**Bd epic:** `claude-memory-1yc`
+
+## Goal
+
+Three thematic threads, deliberately bundled because none alone deserves a major version but together they raise the project from "works" to "feels finished":
+
+1. **Classifier quality** — make the auto-capture hook actually trust-worthy with few-shot prompting and a regression-gated accuracy floor.
+2. **User-facing DX** — `doctor` (diagnostic), `migrate-project` (path moved), HTML timeline (PR review).
+3. **Community / coverage / Windows** — OSS hygiene files, llvm-cov badge, Windows test parity for the CLI classifier.
+
+## Success criteria
+
+1. `cargo test --workspace --all-targets` green on three OS — including the previously-skipped `cfg(unix)`-only classifier tests.
+2. New `tests/classifier_eval.rs` runs against a checked-in labeled dataset and enforces an accuracy floor; CI fails when the floor is broken.
+3. `task-journal doctor` exits 0 on a healthy install and emits a machine-readable summary that flags missing `claude` CLI / unwritable data dirs / unknown migrations.
+4. `task-journal migrate-project --from <old> --to <new>` re-keys the JSONL + SQLite + metrics for the new project hash; round-trips through `task_pack`.
+5. `task-journal export --format html --task <id>` emits a self-contained HTML timeline.
+6. Coverage report: `cargo llvm-cov --workspace` runs in CI and uploads to Codecov; README badge reflects the status.
+7. `CONTRIBUTING.md`, `CODE_OF_CONDUCT.md`, `.github/ISSUE_TEMPLATE/*`, `.github/PULL_REQUEST_TEMPLATE.md` exist and link from README.
+8. PR opened against `main` (after `0.2.0-rc.1` is in main).
+
+## Non-goals (deferred)
+
+- Opt-in telemetry endpoint (requires hosted backend — separate decision).
+- C/C++/server-side LSP integration.
+- Multi-language classifier prompts.
+
+## Tasks (8)
+
+| # | Task | Touches | Test? | Notes |
+|---|------|---------|-------|-------|
+| C1 | OSS hygiene files: `CONTRIBUTING.md`, `CODE_OF_CONDUCT.md`, issue + PR templates | repo root + `.github/` | n/a | Standard OSS scaffolding; not blocking other work. |
+| C2 | `cargo-llvm-cov` job in CI + Codecov upload + README badge | `.github/workflows/ci.yml`, `README.md` | n/a | Non-blocking initially; flip threshold to blocking after 5 baselines. |
+| C3 | Windows-compatible tests for `ClaudeCliClassifier` (currently `cfg(all(test, unix))`) | `crates/tj-core/src/classifier/cli.rs` | yes (port the two existing fake-claude tests to use `.cmd`/`.bat` shim on Windows) | Closes the platform gap noticed in the audit. |
+| C4 | `task-journal doctor` command | `tj-cli/src/main.rs`, possibly small `tj-core::diagnostics` mod | yes (CLI integration test) | Checks: claude bin in PATH, data dirs writable, schema_migrations matches expected, last_indexed_event_id consistent. |
+| C5 | `task-journal migrate-project --from <path> --to <path>` | `tj-cli/src/main.rs`, `tj-core::project_hash`, fs ops | yes | Renames `<old_hash>.jsonl`, `<old_hash>.sqlite`, `<old_hash>.jsonl` in metrics, etc. |
+| C6 | `task-journal export --format html [--task <id>]` | `tj-cli/src/main.rs` (existing `export` command), new tiny `html_timeline` helper | yes | Self-contained: inline CSS, no external assets. |
+| C7 | Few-shot prompting in classifier | `tj-core/src/classifier/prompt.rs` | yes (prompt contains 6 examples; size still bounded < 64KB) | 2 examples per harder pair: hypothesis vs finding, finding vs evidence, decision vs hypothesis. |
+| C8 | Classifier eval dataset + accuracy gate | `tj-core/tests/classifier_eval.rs`, `tj-core/tests/fixtures/classifier_eval.jsonl` | yes (eval test enforces ≥ 70% baseline) | Hand-label ~30 chunks; uses `MockClassifier` + golden expected outputs to keep deterministic; real-classifier path stays opt-in via env var so CI does not need API access. |
+
+## Sequencing
+
+```
+C1 ─┐
+C2 ─┤
+C3 ─┼─→ (independent)
+C4 ─┤
+C5 ─┤
+C6 ─┘
+
+C7 ─→ C8 (eval validates the new prompt against the dataset)
+```
+
+C1/C2/C3 can land in any order. C4/C5/C6 are independent CLI features. C7 unlocks C8 (the eval dataset is the way to *measure* that few-shot improved precision rather than degraded it).
+
+## Risks
+
+- **C7 prompt regression:** few-shot can over-fit examples and degrade on out-of-distribution chunks. Mitigation: eval set in C8 covers boundary cases (`hypothesis-not-finding`, etc).
+- **C8 false confidence:** ≥70% on 30 examples is a noisy estimate. Mitigation: ratchet floor up only after collecting 100+ labeled examples in dogfooding.
+- **C5 destructive migration:** if `--from` and `--to` resolve to the same hash (symlink, case-insensitive FS), we'd corrupt data. Mitigation: refuse when `from_hash == to_hash`; require `--force` to overwrite an existing destination.
+- **C3 Windows shim:** rewriting the fake-claude test in PowerShell vs `.cmd` vs Python — pick `.cmd` for minimal surface; some Windows tests skip on lack of `cmd.exe` is acceptable.
+
+## Verification gate (per task)
+
+Same as Epic A/B:
+1. `cargo fmt --all -- --check`
+2. `cargo clippy --workspace --all-targets -- -D warnings`
+3. `cargo test --workspace --all-targets`
+4. `git diff --stat` review
+5. Conventional-commit prefix
+6. `bd close <id> --reason "..."`
+
+## Final verification (epic-level)
+
+- All 8 sub-tasks closed in bd
+- `cargo bench --workspace --no-run` clean
+- `cargo llvm-cov --workspace --summary-only` reports a number
+- `task-journal doctor` runs locally and prints the diagnostics
+- PR body lists which features changed user-facing CLI surface