Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
c50205e
chore(lint): clear rustc 1.95 clippy and fmt regressions
Shahinyanm May 6, 2026
843a689
docs(plan): epic A — v0.1.4 hardening plan
Shahinyanm May 6, 2026
2e6cae8
fix(db): rebuild_state skips malformed JSONL lines instead of aborting
Shahinyanm May 6, 2026
b95279c
fix(http): add 15s timeout to AnthropicClassifier requests
Shahinyanm May 6, 2026
feb3724
refactor(core): centralize SCHEMA_VERSION as single const
Shahinyanm May 6, 2026
89ed91f
fix(id): extend task_id from 6 to 10 characters
Shahinyanm May 6, 2026
20fafa4
chore(mcp): remove vestigial stub:bool from all responses
Shahinyanm May 6, 2026
2eb87e1
chore: add .editorconfig at repo root
Shahinyanm May 6, 2026
ec19e71
ci: add MSRV job pinning rust 1.83
Shahinyanm May 6, 2026
7e421d1
feat(classifier): TJ_CLASSIFIER_MODEL env var overrides hardcoded model
Shahinyanm May 6, 2026
dd7b007
ci: add cargo-audit job for security advisories
Shahinyanm May 6, 2026
b8ab012
fix(storage): exclusive file lock around JsonlWriter append (race-saf…
Shahinyanm May 6, 2026
8e8ca81
docs: add CHANGELOG.md (Keep-a-Changelog) covering 0.1.0..0.1.4
Shahinyanm May 6, 2026
35ab19b
feat(db): migrations framework with schema_migrations table
Shahinyanm May 6, 2026
3391d76
perf(db): incremental indexing — ingest only the JSONL tail since las…
Shahinyanm May 6, 2026
2182b5e
perf(pack): regression test for working pack-cache after incremental …
Shahinyanm May 6, 2026
d9d9016
feat(mcp)!: structured RPC error envelope (BREAKING)
Shahinyanm May 6, 2026
c01f433
fix(mcp,cli): validate task_id exists before recording close event
Shahinyanm May 6, 2026
9686081
feat(mcp): --project-dir argument overrides cwd
Shahinyanm May 6, 2026
571aebe
perf(mcp): wrap blocking I/O in tokio::task::spawn_blocking
Shahinyanm May 6, 2026
c2d6392
perf(bench): criterion benches for rebuild_state, pack assemble, FTS …
Shahinyanm May 6, 2026
0a6bf5c
release: bump workspace version to 0.2.0-rc.1
Shahinyanm May 6, 2026
eb1e8b8
chore: OSS hygiene — CONTRIBUTING, CoC, issue and PR templates
Shahinyanm May 7, 2026
7acf918
ci: cargo-llvm-cov coverage job + Codecov upload + README badge
Shahinyanm May 7, 2026
78f1017
test(classifier): cross-platform fake-claude shim, drop cfg(unix) gate
Shahinyanm May 7, 2026
837daab
feat(cli): task-journal doctor diagnostics command
Shahinyanm May 7, 2026
d71e0e4
feat(cli): task-journal migrate-project --from PATH --to PATH
Shahinyanm May 7, 2026
8065cb2
feat(export): HTML timeline output (export --format html)
Shahinyanm May 7, 2026
44ce215
feat(classifier): few-shot examples in prompt
Shahinyanm May 7, 2026
433bf67
test(classifier): labeled eval dataset + opt-in accuracy gate
Shahinyanm May 7, 2026
e227662
docs: epic C PR body for review
Shahinyanm May 7, 2026
6e5e5a9
perf(mcp): cache SQLite connections per state-path
Shahinyanm May 7, 2026
055a97e
feat(export): task-journal export --format sqlite
Shahinyanm May 7, 2026
d7ce128
feat(cli): pending list and retry visibility
Shahinyanm May 7, 2026
958133a
test(mcp): rmcp client + transport compile-and-shape integration test
Shahinyanm May 7, 2026
1874740
feat(mcp): structured tracing with correlation_id per tool call
Shahinyanm May 7, 2026
80b9afe
feat(mcp): graceful SIGTERM and Ctrl-C shutdown
Shahinyanm May 7, 2026
dd71db3
release: bump workspace version to 0.2.1
Shahinyanm May 7, 2026
1f56574
docs: epic D PR body for review
Shahinyanm May 7, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
89 changes: 89 additions & 0 deletions .docs/plans/2026-05-06-v0.1.4-hardening.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# Epic A — v0.1.4 hardening

**Date:** 2026-05-06
**Branch:** `claude/youthful-shaw-b96d78`
**Target release:** `0.1.4` (backwards-compatible patch)
**Bd epic:** see `bd list --type epic` (assigned id at runtime)

## Goal

Ship a backwards-compatible patch that closes the most acute correctness, robustness, and OSS-hygiene gaps identified in the 2026-05-06 audit, **without breaking the public CLI/MCP contract**. Anything that requires a breaking change is deferred to Epic B (v0.2.0).

## Success criteria

1. `cargo test --workspace --all-targets` green on `ubuntu-latest`, `macos-latest`, `windows-latest`.
2. `cargo audit` runs in CI and is clean (or vulnerabilities are accepted with documented reason).
3. `cargo clippy --workspace --all-targets -- -D warnings` clean.
4. `cargo doc --workspace --no-deps` clean with `RUSTDOCFLAGS=-D warnings`.
5. New CI job pins MSRV (currently `1.83`) and verifies build.
6. `CHANGELOG.md` exists and documents `0.1.4` entry.
7. No removed/renamed CLI flags. No removed MCP tools or required parameters.
8. Branch `claude/youthful-shaw-b96d78` pushed; PR opened against `main`.

## Out of scope (deferred)

- Incremental indexing / pack-cache fix (Epic B — perf)
- MCP error contract redesign (Epic B — breaking)
- `--project-dir` argument for MCP (Epic B)
- Migrations framework (Epic B — coupled with incremental indexing)
- Few-shot prompting + eval datasets (Epic C — quality)
- `task-journal doctor`, `migrate-project` (Epic C — DX)

## Tasks (11)

Each task is one atomic commit. Test-first when behavior changes; doc/CI-only tasks may skip the failing-test step.

| # | Task | Touches | Test? | Notes |
|---|------|---------|-------|-------|
| A1 | HTTP timeout for `AnthropicClassifier` | `classifier/http.rs` | yes (mockito slow-server) | 15s connect+read timeout. Hardcoded — env-var override deferred. |
| A2 | Graceful skip of malformed JSONL lines in `rebuild_state` | `db.rs` | yes (jsonl with bad line) | Log a `tracing::warn!` and continue; total parsed count returned. |
| A3 | Classifier model overridable via env var | `classifier/http.rs`, `classifier/cli.rs` | yes (env unset → default; env set → override) | `TJ_CLASSIFIER_MODEL`; default unchanged. |
| A4 | Extend task_id from 6 → 10 characters | `crates/tj-mcp/src/main.rs`, `crates/tj-cli/src/main.rs` | yes (collision-free over 10k synthetic ids) | Old 6-char ids remain valid (string compare). |
| A5 | Remove `stub: bool` from MCP responses | `crates/tj-mcp/src/main.rs`, smoke tests | yes (smoke test asserts no `stub` field) | Field removal — but no client read it; documented in CHANGELOG. |
| A6 | Centralize `SCHEMA_VERSION` const | `tj-core/src/lib.rs`, `pack.rs`, `tj-mcp/src/main.rs` | yes (single source) | `pub const SCHEMA_VERSION: &str = "1.0";` |
| A7 | `CHANGELOG.md` with Keep-a-Changelog format | new file | n/a | Backfill `0.1.0`–`0.1.3` from `git log`. |
| A8 | `cargo-audit` job in CI | `.github/workflows/ci.yml` | n/a | Non-blocking initially; flips to blocking once green. |
| A9 | MSRV job in CI (`rust-version` = 1.83) | `.github/workflows/ci.yml` | n/a | Uses `dtolnay/rust-toolchain@1.83`. |
| A10 | `.editorconfig` | new file | n/a | LF, UTF-8, 4-space rust, 2-space yaml. |
| A11 | File-lock on JSONL append | `tj-core/src/storage.rs`, `Cargo.toml` | yes (two-writer race test) | Crate: `fd-lock` (cross-platform). Blocking lock. |

## Sequencing

```
A6 ──┐
A1 ──┼─→ A7 (CHANGELOG references all done work)
A2 ──┤
A3 ──┤
A4 ──┤
A5 ──┤
A10 ─┤
A8 ──┤
A9 ──┘
A11 last (fd-lock dep + race test)
```

A11 last because it adds a runtime dependency and a flaky-prone test; everything else lands first so green CI is the baseline before introducing the lock.

## Risks

- **A4 task_id length change:** new ids longer; nothing reads fixed-width. Verified by smart_read of CLI/MCP code paths.
- **A5 `stub` removal:** technically a schema change, but `stub` was always false post-Phase-1. Documented as non-breaking in CHANGELOG; if any downstream tool actually reads it, we revert in 0.1.5.
- **A11 fd-lock on Windows:** `fd-lock` uses `LockFileEx` on Windows; behavior differs from Linux `flock`. Test must cover both.
- **A2 swallowing real corruption:** mitigation — log at `warn!` level with line number and parse error.

## Verification (per task)

1. `cargo fmt --all --check`
2. `cargo clippy --workspace --all-targets -- -D warnings`
3. `cargo test --workspace --all-targets` (specific test for the touched module)
4. `git diff --stat` reviewed (no unintended line-ending or whitespace flips)
5. Commit with conventional-commit prefix (`fix:`, `chore:`, `docs:`, `ci:`, `feat:`)
6. `bd update <id> --status closed --reason "<one-line>"`

## Final verification (epic-level)

- `cargo test --workspace --all-targets` green
- `cargo audit` clean
- `bd list --parent <epic-id> --status open` returns empty
- `git log --oneline 8c49785..HEAD` matches the 11 tasks 1:1
- `gh pr create` opened against `main` with the CHANGELOG entry as body
58 changes: 58 additions & 0 deletions .docs/plans/2026-05-06-v0.2.0-epic-c-pr-body.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
## Summary

Epic C — quality / DX / community polish. **8 atomic commits** on `claude/v0.2.0-epic-c`, built off `claude/v0.2.0-epic-b` HEAD.

> **Merge order:** epic A → main, then epic B (rebased on main), then this branch (rebased on main).

Plan: [`.docs/plans/2026-05-06-v0.2.0-epic-c-quality.md`](./.docs/plans/2026-05-06-v0.2.0-epic-c-quality.md)

### What changed

**Classifier quality**
- `feat(classifier)` — six few-shot Input/Output examples in the prompt covering the harder boundary calls (hypothesis vs finding, finding vs evidence, decision vs hypothesis). Prompt-budget guard still passes.
- `test(classifier)` — 30-row labeled eval fixture + opt-in accuracy gate (`TJ_CLASSIFIER_EVAL=on`). Default mode runs hermetic shape tests; opt-in mode calls `ClaudeCliClassifier::default()` and asserts ≥ 0.70 accuracy. Floor will ratchet up after 100+ dogfood examples.

**User-facing DX**
- `feat(cli)` — `task-journal doctor` self-check command with human + `--json` output. Reports claude-on-PATH, data-dir writability, known projects, schema migrations.
- `feat(cli)` — `task-journal migrate-project --from PATH --to PATH [--force]`. Renames JSONL/SQLite/metrics from old project_hash to new; UPDATEs `tasks.project_hash` and `index_state.project_hash` columns in SQLite.
- `feat(export)` — `export --format html` produces a self-contained timeline page (inline CSS, no external assets, dark-mode aware via `prefers-color-scheme`).

**OSS / coverage / Windows**
- `chore` — `CONTRIBUTING.md`, `CODE_OF_CONDUCT.md` (Contributor Covenant 2.1 ref), three `.github/ISSUE_TEMPLATE/*`, `.github/PULL_REQUEST_TEMPLATE.md`, README links.
- `ci` — `cargo-llvm-cov` job + Codecov upload + README badge. Non-blocking initially.
- `test(classifier)` — cross-platform fake-claude shim (`.sh`/`cat` on Unix, `.cmd`/`type` on Windows). The two `ClaudeCliClassifier` tests now run on all three CI matrix OS instead of `cfg(unix)` only.

### Verification

- `cargo fmt --all -- --check` ✅
- `cargo clippy --workspace --all-targets -- -D warnings` ✅
- `cargo test --workspace --all-targets` ✅ — **202 tests** (was 193 from epic B; +9 added by this PR)
- `cargo bench --workspace --no-run` ✅

### New CLI surface

| Command | Purpose |
|---------|---------|
| `task-journal doctor [--json]` | Diagnostic check; non-zero exit on issues |
| `task-journal migrate-project --from PATH --to PATH [--force]` | Re-key on-disk data when project moves |
| `task-journal export --format html [--task ID]` | Self-contained HTML timeline |

### New env vars

| Var | Effect |
|-----|--------|
| `TJ_CLASSIFIER_EVAL=on` | Enables the real-classifier accuracy run in `cargo test`. Default OFF — CI stays hermetic. |

### Test plan

- [ ] Branch CI green on three OS for `test`, `msrv`, `audit`, `benches-compile`, `coverage` (new).
- [ ] Try `task-journal doctor` on a clean VM — confirms claude-binary detection and dir-writability checks.
- [ ] Move a project on disk, run `migrate-project`, confirm `task_pack` works in the new location.
- [ ] `task-journal export --format html --task tj-X > timeline.html` and open in browser; verify dark mode + no broken layout.
- [ ] (Optional, manual) `TJ_CLASSIFIER_EVAL=on cargo test classifier_meets_accuracy_floor` against the real `claude` CLI; record baseline accuracy.

### After this lands

`v0.2.0` final tag. No further code changes expected — the dogfood window from `0.2.0-rc.1` already exercised epic B; this epic is additive and behind-the-scenes for almost every existing user.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
84 changes: 84 additions & 0 deletions .docs/plans/2026-05-06-v0.2.0-epic-c-quality.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# Epic C — v0.2.0: quality, DX, community polish

**Date:** 2026-05-06
**Branch:** `claude/v0.2.0-epic-c` (off `claude/v0.2.0-epic-b` HEAD)
**Target release:** `0.2.0` final (after epic B's `rc.1` is dogfooded and this PR merges)
**Bd epic:** `claude-memory-1yc`

## Goal

Three thematic threads, deliberately bundled because none alone deserves a major version but together they raise the project from "works" to "feels finished":

1. **Classifier quality** — make the auto-capture hook actually trust-worthy with few-shot prompting and a regression-gated accuracy floor.
2. **User-facing DX** — `doctor` (diagnostic), `migrate-project` (path moved), HTML timeline (PR review).
3. **Community / coverage / Windows** — OSS hygiene files, llvm-cov badge, Windows test parity for the CLI classifier.

## Success criteria

1. `cargo test --workspace --all-targets` green on three OS — including the previously-skipped `cfg(unix)`-only classifier tests.
2. New `tests/classifier_eval.rs` runs against a checked-in labeled dataset and enforces an accuracy floor; CI fails when the floor is broken.
3. `task-journal doctor` exits 0 on a healthy install and emits a machine-readable summary that flags missing `claude` CLI / unwritable data dirs / unknown migrations.
4. `task-journal migrate-project --from <old> --to <new>` re-keys the JSONL + SQLite + metrics for the new project hash; round-trips through `task_pack`.
5. `task-journal export --format html --task <id>` emits a self-contained HTML timeline.
6. Coverage report: `cargo llvm-cov --workspace` runs in CI and uploads to Codecov; README badge reflects the status.
7. `CONTRIBUTING.md`, `CODE_OF_CONDUCT.md`, `.github/ISSUE_TEMPLATE/*`, `.github/PULL_REQUEST_TEMPLATE.md` exist and link from README.
8. PR opened against `main` (after `0.2.0-rc.1` is in main).

## Non-goals (deferred)

- Opt-in telemetry endpoint (requires hosted backend — separate decision).
- C/C++/server-side LSP integration.
- Multi-language classifier prompts.

## Tasks (8)

| # | Task | Touches | Test? | Notes |
|---|------|---------|-------|-------|
| C1 | OSS hygiene files: `CONTRIBUTING.md`, `CODE_OF_CONDUCT.md`, issue + PR templates | repo root + `.github/` | n/a | Standard OSS scaffolding; not blocking other work. |
| C2 | `cargo-llvm-cov` job in CI + Codecov upload + README badge | `.github/workflows/ci.yml`, `README.md` | n/a | Non-blocking initially; flip threshold to blocking after 5 baselines. |
| C3 | Windows-compatible tests for `ClaudeCliClassifier` (currently `cfg(all(test, unix))`) | `crates/tj-core/src/classifier/cli.rs` | yes (port the two existing fake-claude tests to use `.cmd`/`.bat` shim on Windows) | Closes the platform gap noticed in the audit. |
| C4 | `task-journal doctor` command | `tj-cli/src/main.rs`, possibly small `tj-core::diagnostics` mod | yes (CLI integration test) | Checks: claude bin in PATH, data dirs writable, schema_migrations matches expected, last_indexed_event_id consistent. |
| C5 | `task-journal migrate-project --from <path> --to <path>` | `tj-cli/src/main.rs`, `tj-core::project_hash`, fs ops | yes | Renames `<old_hash>.jsonl`, `<old_hash>.sqlite`, `<old_hash>.jsonl` in metrics, etc. |
| C6 | `task-journal export --format html [--task <id>]` | `tj-cli/src/main.rs` (existing `export` command), new tiny `html_timeline` helper | yes | Self-contained: inline CSS, no external assets. |
| C7 | Few-shot prompting in classifier | `tj-core/src/classifier/prompt.rs` | yes (prompt contains 6 examples; size still bounded < 64KB) | 2 examples per harder pair: hypothesis vs finding, finding vs evidence, decision vs hypothesis. |
| C8 | Classifier eval dataset + accuracy gate | `tj-core/tests/classifier_eval.rs`, `tj-core/tests/fixtures/classifier_eval.jsonl` | yes (eval test enforces ≥ 70% baseline) | Hand-label ~30 chunks; uses `MockClassifier` + golden expected outputs to keep deterministic; real-classifier path stays opt-in via env var so CI does not need API access. |

## Sequencing

```
C1 ─┐
C2 ─┤
C3 ─┼─→ (independent)
C4 ─┤
C5 ─┤
C6 ─┘

C7 ─→ C8 (eval validates the new prompt against the dataset)
```

C1/C2/C3 can land in any order. C4/C5/C6 are independent CLI features. C7 unlocks C8 (the eval dataset is the way to *measure* that few-shot improved precision rather than degraded it).

## Risks

- **C7 prompt regression:** few-shot can over-fit examples and degrade on out-of-distribution chunks. Mitigation: eval set in C8 covers boundary cases (`hypothesis-not-finding`, etc).
- **C8 false confidence:** ≥70% on 30 examples is a noisy estimate. Mitigation: ratchet floor up only after collecting 100+ labeled examples in dogfooding.
- **C5 destructive migration:** if `--from` and `--to` resolve to the same hash (symlink, case-insensitive FS), we'd corrupt data. Mitigation: refuse when `from_hash == to_hash`; require `--force` to overwrite an existing destination.
- **C3 Windows shim:** rewriting the fake-claude test in PowerShell vs `.cmd` vs Python — pick `.cmd` for minimal surface; some Windows tests skip on lack of `cmd.exe` is acceptable.

## Verification gate (per task)

Same as Epic A/B:
1. `cargo fmt --all -- --check`
2. `cargo clippy --workspace --all-targets -- -D warnings`
3. `cargo test --workspace --all-targets`
4. `git diff --stat` review
5. Conventional-commit prefix
6. `bd close <id> --reason "..."`

## Final verification (epic-level)

- All 8 sub-tasks closed in bd
- `cargo bench --workspace --no-run` clean
- `cargo llvm-cov --workspace --summary-only` reports a number
- `task-journal doctor` runs locally and prints the diagnostics
- PR body lists which features changed user-facing CLI surface
Loading
Loading