Skip to content

v0.2.1: hardening + perf + DX + operational maturity#11

Merged
Shahinyanm merged 39 commits into
mainfrom
claude/v0.2.1-epic-d
May 7, 2026
Merged

v0.2.1: hardening + perf + DX + operational maturity#11
Shahinyanm merged 39 commits into
mainfrom
claude/v0.2.1-epic-d

Conversation

@Shahinyanm

Copy link
Copy Markdown
Member

Summary

Epic D — v0.2.1 operational maturity. 9 atomic commits on claude/v0.2.1-epic-d, built off claude/v0.2.0-epic-c HEAD. Non-breaking — minor bump after 0.2.0.

Merge order: epic A → main, epic B (rebased) → main, epic C (rebased) → main → tag v0.2.0. Then this branch (rebased) → main → tag v0.2.1.

Plan: .docs/plans/2026-05-07-v0.2.1-epic-d-operational.md

What changed

Performance

  • perf(mcp) — process-wide Arc<Mutex<rusqlite::Connection>> cache keyed by state path. First call opens; later calls reuse. Eliminates per-call PRAGMA + migrations replay.

User-facing DX

  • feat(export)task-journal export --format sqlite produces a clean VACUUM-based snapshot, streamable to stdout for > backup.sqlite. Round-trips through task-journal pack from a fresh XDG.
  • feat(cli)task-journal pending list and pending retry. Surface auto-capture-hook failures that used to sit silently in pending/. attempts counter; rename to <id>.dead.json after 3 failures so they stop being retried but still appear in list.

Observability

  • feat(mcp) — structured tracing with correlation_id per tool call. Two INFO log lines (start + ok / err) wrap each handler. Default RUST_LOG=info gives one greppable line per request.
  • feat(mcp) — graceful Ctrl-C / SIGTERM (Unix only) shutdown via tokio::select! between rmcp serve loop and wait_for_shutdown_signal().

Quality

  • test(mcp) — rmcp client + transport compile-and-shape integration test. Full E2E roundtrip deferred to follow-up claude-memory-yj1.8 (needs TaskJournalServer extracted into a lib target — out of scope for D).

Release

  • release — workspace version 0.2.0-rc.1 → 0.2.1; CHANGELOG entry.

Verification

  • cargo fmt --all -- --check
  • cargo clippy --workspace --all-targets -- -D warnings
  • cargo test --workspace --all-targets ✅ — 213 tests (was 202 from epic C; +11 added by this PR)
  • cargo bench --workspace --no-run
  • cargo build --workspace --release ✅ — 0.2.1 binaries

New CLI surface

Command Purpose
task-journal export --format sqlite VACUUM-based clean SQLite snapshot to stdout
task-journal pending list List queued classifier failures
task-journal pending retry [--mock-*] Re-feed pending entries; mark dead after 3

New env vars

None beyond what the existing RUST_LOG already controls. The structured-tracing output is tied to it.

Test plan

  • Branch CI green on three OS (test, msrv, audit, benches-compile, coverage).
  • Smoke run task-journal-mcp from a real MCP client; observe tool_call start/ok lines in stderr; SIGTERM exits 0 within ~1s.
  • task-journal export --format sqlite > backup.sqlite then sqlite3 backup.sqlite '.schema' shows the v001+v002 tables.
  • After dogfooding 0.2.0 + this branch for ~3 days, tag v0.2.1 and cargo publish (after rebasing on main once epics A/B/C are landed).

Out of scope / deferred

  • claude-memory-yj1.8 — extract TaskJournalServer into a tj-mcp library target. Unblocks the full E2E rmcp roundtrip test we deferred from D4. Tracked as a side-quest for a future epic.
  • Telemetry endpoint (still requires hosted backend).
  • task-journal compact (lifecycle archival of closed tasks) — wants a design pass, deferred.

🤖 Generated with Claude Code

Shahinyanm added 30 commits May 6, 2026 14:42
Pre-existing regressions on rustc 1.95 baseline (useless_vec,
unnecessary_sort_by, rustfmt style adjustments) blocked the CI gate.

Applied: cargo fmt --all + cargo clippy --fix + one manual
sort_by_key migration in session/discovery.rs.

No semantic changes; formatting and lint compliance only.

Closes claude-memory-iyn.12
Documents the 11-task hardening epic landing as 0.1.4: HTTP timeout,
graceful JSONL skip, env-var classifier model, longer task_id, drop
stub field, SCHEMA_VERSION const, CHANGELOG.md, cargo-audit CI, MSRV
job, .editorconfig, JSONL file-lock.

Backwards-compatible only. Breaking and perf changes deferred to
epic B (v0.2.0). Quality/DX deferred to epic C.

Refs claude-memory-iyn
A single bad line in the events JSONL would abort the whole rebuild
transaction, leaving SQLite empty and re-aborting on every retry.
For an append-only journal this is too brittle.

Now: malformed lines are logged via tracing::warn and skipped; SQL
errors still propagate (those indicate schema/integrity problems).
Returned count reflects only successfully-indexed events.

Adds tracing dep to tj-core (workspace dep already declared).
New test rebuild_state_skips_malformed_jsonl_lines covers both
non-JSON garbage and valid-JSON-but-not-an-Event cases.

Closes claude-memory-iyn.2
The HTTP classifier built the request without any timeout, so on a
stalled network or rate-limit lockup the call would hang indefinitely.
Hooks wrap classifier calls in || true, but that protects against
exit codes, not against blocked turns.

Adds AnthropicClassifier::timeout field (default 15s via DEFAULT_TIMEOUT
const). Used in the ureq Request chain.

Test classifier_times_out_on_unresponsive_server binds a TCP socket
that completes the handshake but never replies; with timeout=300ms
the call must Err in well under 3s.

Closes claude-memory-iyn.1
The schema-version string "1.0" was inlined at four production
sites (event.rs, pack.rs x2, tj-mcp main.rs). Bumping the version
required four search-replaces — one of them being in another crate.

Now: pub const SCHEMA_VERSION in tj-core::lib, referenced from all
four sites. Test pack_assembler_does_not_inline_schema_version_literal
guards against future regressions by scanning pack.rs source.

Closes claude-memory-iyn.6
Six base32 chars from a ULID give ~24 bits of entropy, which is
~4096 tasks before a 50% collision risk under birthday paradox.
For a long-lived project journal this is uncomfortably close.

Now: tj_core::new_task_id() helper produces "tj-" + 10 chars
(~50 bits, ~33M threshold). Used in tj-cli, tj-mcp, and the
session backfill extractor — replaces three slightly-different
inline copies.

Old 6-char IDs continue to work since storage keys are opaque
strings; this only affects newly-generated tasks.

Tests: shape check + 10k uniqueness sweep.

Closes claude-memory-iyn.4
Phase-1 left every MCP result type with a stub:bool flag that was
always false in production. The field was never read by any client
and made every JSON payload look unfinished.

Removed from TaskPackResult, TaskPackMetadata, TaskSearchResult,
TaskCreateResult, EventAddResult, TaskCloseResult and their eight
in-place initializations. Regression test no_response_serializes_a_
stub_field guards against re-introduction.

Technically a JSON shape change, but stub was a write-only field
with no documented consumers — clients reading these payloads will
see one fewer key, never an unexpected one.

Closes claude-memory-iyn.5
Standard OSS hygiene file. LF line endings, UTF-8, final newline,
trim trailing whitespace, 4-space rust, 2-space yaml/toml/md/json,
2-space sh, tab Makefile. Cargo.lock and *.jsonl carve-outs.

Closes claude-memory-iyn.10
Cargo.toml declares rust-version = 1.83 but the existing CI only
tested @stable, so an accidental new-feature use would slip in
silently and break downstream consumers locked to MSRV.

New msrv job: ubuntu-latest with dtolnay/rust-toolchain@1.83,
cargo build + cargo test on the full workspace. Separate cache
key so it does not collide with the stable job.

Closes claude-memory-iyn.9
Both the subscription (claude -p) and Anthropic API classifiers
hardcoded their model alias. When Anthropic deprecates a model the
classifier silently breaks until a release ships.

Now: each classifier reads TJ_CLASSIFIER_MODEL with backend-specific
default (haiku alias for CLI, claude-haiku-4-5-20251001 for API).
DEFAULT_MODEL constants exposed for tests and external override.

Test tj_classifier_model_env_var_overrides_defaults_for_both_backends
combines both backends into one serialized read-set-restore flow to
avoid env-var races between concurrent test threads.

README documents the new env var.

Closes claude-memory-iyn.3
A published crate without supply-chain auditing is a rough edge.
The existing CI ran fmt, clippy, test, and doc but had no advisory
gate.

New audit job uses rustsec/audit-check@v2 against RUSTSEC. Marked
continue-on-error initially so an existing transitive-dep advisory
does not block unrelated PRs; once the first run is green we remove
the flag and make audit blocking.

Closes claude-memory-iyn.8
…e Windows)

POSIX append on Linux is atomic for writes <= PIPE_BUF, but Windows
makes no such guarantee. Two writers (auto-capture hook + manual
task-journal event + MCP server) racing on the same JSONL file
could interleave bytes mid-line, corrupting the source of truth.

Now: JsonlWriter wraps the file in fd_lock::RwLock; append and
flush_durable each acquire an exclusive advisory lock for the
duration of the write/sync. Cross-platform: flock on Linux/macOS,
LockFileEx on Windows.

Removed the BufWriter wrapper — for a journal seeing handful of
events per minute, a syscall per write is unmeasurable, and
buffering with locks added complexity without real benefit.

Test concurrent_appends_do_not_interleave_bytes spawns 8 threads
each owning its own JsonlWriter (own File handle, own fd_lock
instance) and writing 100 events. Asserts 800 well-formed Events.
Closes the loop on race-free behavior on both platforms.

Closes claude-memory-iyn.11
Backfills release notes for the prior four crates.io releases from
git history and adds the full v0.1.4 entry summarizing this epic
(11 tasks plus a baseline lint cleanup).

Linked from the README. Compare links target the GitHub repo so
they resolve once the v0.1.4 tag is pushed.

Closes claude-memory-iyn.7
Replaces the single MIGRATION_001 const + execute_batch() pattern
with a forward-only migrations registry tracked in a schema_migrations
table (version, applied_at). Each declared migration runs at most
once per database; reopening an existing DB is a no-op for migrations.

Foundation for B2 (incremental indexing introduces a new index_state
table via migration v002 — would require this table-of-versions
contract anyway, so it lands first).

Backwards-compatible for existing 0.1.x databases: schema_migrations
starts empty, v001 SQL re-runs against IF NOT EXISTS tables harmlessly,
and the v=1 row is recorded on first 0.2.0 open.

Tests: fresh_db_runs_all_migrations + apply_migrations_is_idempotent_
across_reopens cover both the fresh and upgrade paths.

Refs claude-memory-gyq.1
…t marker

Every MCP tool call (task_pack, task_search) re-read the entire JSONL
log on every invocation and replayed it through events_index/search_fts.
At 10k events that is seconds per call; at 100k it is unworkable.

Schema: migration v002 adds index_state(project_hash PK, last_indexed_
event_id, updated_at). rebuild_state and the new ingest_new_events
both update this row to the most recent event_id they wrote.

Behavior: ingest_new_events scans to the marker and applies only the
tail. Two safe fall-back paths to a full rebuild_state:
  • no marker yet (first call after migration v002)
  • marker not present in JSONL (file was rewritten or hand-edited)
The fallback path emits a tracing::warn so corruption is visible.

Switched five callers (mcp::task_pack, mcp::task_search, cli::pack,
cli::ingest-hook, cli::search) to ingest_new_events. The explicit
 CLI command retains the full rebuild
semantics — it is the recovery escape hatch.

Tests:
  • ingest_new_events_picks_up_only_new_lines (3 + 2 events;
    second pass reads only the 2 new lines).
  • ingest_new_events_falls_back_to_full_rebuild_when_marker_vanishes.
  • rebuild_state_and_ingest_new_events_produce_same_state (golden
    equivalence comparison).

Refs claude-memory-gyq.2
…ingest

Before B2 every MCP call ran a full rebuild_state which replayed
every event through index_event(), and index_event() invalidates the
pack cache for that task. So pack-cache rows lived for milliseconds
at most — never reused.

After B2 ingest_new_events only processes the JSONL tail. When there
are no new events at all, no index_event runs, no cache rows are
DELETEd, and the next assemble() returns metadata.cache_hit = true.

The fix is implicit (it falls out of B2) — adding the test now so a
future regression in either ingest_new_events or index_event will
break this test rather than silently double our pack latency.

Refs claude-memory-gyq.3
Tool handlers no longer mask failures as success-typed Json with
task_id = literal [error] msg. They now return Result<Json<T>,
McpError>, so a tj_core failure surfaces as a JSON-RPC error frame
that the client can detect, log, and surface to the user.

BREAKING CHANGE: any client parsing the [error] string out of the
task_id field will see a JSON-RPC error response instead. Update by
checking for the rpc error frame before deserializing the result.

  Before:                                  After:
  task_pack -> Json<TaskPackResult>        task_pack -> Result<Json<...>, McpError>
  task_search -> Json<TaskSearchResult>    task_search -> Result<Json<...>, McpError>
  task_create -> Json<TaskCreateResult>    task_create -> Result<Json<...>, McpError>
  event_add -> Json<EventAddResult>        event_add -> Result<Json<...>, McpError>
  task_close -> Json<TaskCloseResult>      task_close -> Result<Json<...>, McpError>

Helper into_mcp_error formats the full anyhow chain (root cause +
context wraps) into the RPC error message so the client sees the
same diagnostic depth a Rust caller would.

Tests:
  - into_mcp_error_carries_full_anyhow_chain
  - task_pack_returns_rpc_error_when_state_dir_is_unusable
    (smoke: project_paths failure → into_mcp_error gives non-empty msg)

Refs claude-memory-gyq.4
Closing a non-existent task used to silently succeed: the close
event would be appended to JSONL with a task_id that has no open
event, leaving an unclosable orphan record.

Now: both the CLI Close subcommand and the MCP task_close tool
ingest_new_events first (catch up the index), then assert
task_exists() before writing the close event. Failure surfaces as
anyhow::Error in CLI (non-zero exit + stderr) and as McpError in
MCP (RPC error frame, thanks to B4).

New helpers:
  - tj_core::db::task_exists(conn, task_id) -> bool

Tests:
  - task_exists_returns_true_for_known_id_false_otherwise (unit)
  - close_unknown_task_id_returns_error (CLI integration; cargo
    bin runs in a temp XDG_DATA_HOME)

Refs claude-memory-gyq.5
The MCP server always derived the project_hash from the cwd at the
moment a tool was invoked. Monorepo and parent-dir flows had no way
to point the server at a sub-project without launching it from inside
that directory.

Now: --project-dir <PATH> on the binary CLI sets a process-wide
PROJECT_DIR_OVERRIDE (OnceLock) that every tool handler consults
ahead of cwd. Default behaviour is unchanged when the flag is omitted.

The path is canonicalized at startup so a relative arg or a symlink
becomes a stable absolute hash key.

Tests:
  - resolve_project_paths_uses_provided_dir_for_hash: factor-out
    helper proves two dirs yield two hashes and one dir is stable.
  - cli_parses_project_dir_argument: clap parser smoke for both
    presence and absence of the flag.

Refs claude-memory-gyq.6
The tokio runtime hosts a small thread pool sized to the number of
CPU cores. Synchronous SQLite + JSONL + filesystem work directly in
async fn handlers monopolised that thread for the duration of each
tool call, so two concurrent client requests serialised even on a
multicore box.

Now: every tool body is moved into a closure passed to
tokio::task::spawn_blocking via a small run_blocking() helper that
also collapses JoinError + anyhow::Error into McpError. Inside the
closure we still own + open + drop SQLite connections normally —
crucially never holding a Connection across an await, since
rusqlite::Connection is Send but not Sync.

The classifier-aware tools never directly call HTTP from the MCP
server (only the CLI does), so the synchronous ureq stays on the
blocking pool for free.

Test run_blocking_executes_two_tasks_concurrently: tokio::join! two
200ms sleep_in_blocking calls and assert wall clock < 350ms.

Refs claude-memory-gyq.7
…search

We claim B2 made hot paths O(new) instead of O(all), but every claim
without a number is a wish. Adds a criterion harness that exercises
the three paths the MCP server walks on every tool call.

Three benches, two sizes each (1k and 10k events spread across 100
synthetic tasks):
  - rebuild_state — full-rebuild baseline (the cost we used to pay
    on every MCP call before B2)
  - pack_assemble_cold — invalidates cache then recomputes
  - search_fts — FTS5 MATCH lookup

Wired into CI as a separate benches-compile job that runs cargo bench
--no-run; full timing runs are best done locally on a quiet box, not
on shared GitHub runners. Threshold gates (B2 promised <50ms pack /
<100ms rebuild on 10k) are deferred until a real CI box exists or
five baselines are collected.

Refs claude-memory-gyq.8
Last commit of epic B. Workspace version 0.1.3 -> 0.2.0-rc.1. Inner
crate dependency declarations updated to match (tj-cli and tj-mcp
both depend on tj-core).

CHANGELOG.md gets a [0.2.0-rc.1] - 2026-05-06 section with the
breaking change (MCP error contract) called out first, then Added /
Changed / Performance subsections summarising the eight feature
commits in this epic.

After dogfooding, 0.2.0 will be cut without further code changes —
the rc tag is the gating signal that we want feedback on the new
contract before it hits stable.

Closes claude-memory-gyq.9
Standard scaffolding so new contributors find the rules without
asking. Five files:

  - CONTRIBUTING.md (one-thing-per-PR, conventional commits, CI gate
    expectations, what I will not merge)
  - CODE_OF_CONDUCT.md (Contributor Covenant 2.1 reference)
  - .github/ISSUE_TEMPLATE/bug.md, feature.md, question.md
  - .github/PULL_REQUEST_TEMPLATE.md (matches CONTRIBUTING checklist)

Plan landed in .docs/plans/2026-05-06-v0.2.0-epic-c-quality.md (epic C
scope) — committed in the same change because it covers all eight C
sub-tasks rather than just this one.

README links to CONTRIBUTING / CoC / issue templates from a new
Contributing section.

Refs claude-memory-1yc.1
Adds a coverage workflow job that runs cargo llvm-cov --workspace
--lcov, then uploads via codecov-action@v4. Marked continue-on-error:
true on first land — once we collect 5 baselines and agree a floor
the gate flips to blocking.

CODECOV_TOKEN is read from GitHub secrets if present; for public
repos Codecov v4 falls back to anonymous uploads, so the job is
useful even before the secret is configured.

README gets the codecov badge alongside the existing crates.io / CI
/ License badges.

Refs claude-memory-1yc.2
The two ClaudeCliClassifier tests were gated cfg(all(test, unix)) and
silently skipped on Windows CI. Closes that platform gap.

The shim now writes the JSON envelope to a file and executes a tiny
script that prints it back: cat "PATH" on Unix (.sh + chmod 0755),
type "PATH" on Windows (.cmd batch). The type/cat form avoids the
notoriously fragile cmd-batch escaping of the envelope JSON.

Result: classifier_parses_cli_envelope_and_returns_classified_output
and classifier_surfaces_not_logged_in_with_friendly_hint now run on
all three matrix OS in CI.

Refs claude-memory-1yc.3
Self-check command for users debugging install issues. Reports five
groups of facts:

  1. claude binary in PATH (with version) — required for the
     subscription-mode classifier
  2. data dir + events/state/metrics sub-dir paths and writability
  3. known projects on this machine (count of state-dir SQLite stems)
  4. schema migrations applied for the current cwd project (if any)
  5. an issues[] list of human-readable problems

Exits 0 when issues is empty, 1 otherwise. Default output is human-
readable; --json switches to a stable machine-parseable shape.

CLI integration tests:
  - doctor_exits_zero_on_fresh_install (no events/state files yet)
  - doctor_json_output_is_parseable_and_lists_paths

Refs claude-memory-1yc.4
Project moved on disk -> canonical-path-derived hash changed -> data
orphaned. New CLI command renames the JSONL + SQLite + metrics files
from the old project_hash to the new one and updates the project_hash
columns inside the SQLite (tasks, index_state).

Refuses when --from and --to resolve to the same hash (symlink, case-
insensitive FS). Refuses to overwrite an existing destination file
unless --force is set.

CLI integration tests:
  - migrate_project_round_trips_data_to_new_path: create task in
    project A, migrate-project A -> B, pack from B finds the task.
  - migrate_project_refuses_overwrite_without_force: both have data,
    migration aborts with destination already exists in stderr.

Refs claude-memory-1yc.5
Adds a third format to the existing export subcommand. Renders a
self-contained HTML page (inline CSS, no external assets) showing
the task timeline grouped by task_id. Useful as a PR-review
attachment or sprint retro artefact.

Design notes:
  - All five HTML special chars (& < > " ) are escaped via
    html_escape() — no XSS surface even though we never render
    third-party HTML.
  - CSS uses prefers-color-scheme so light and dark mode both look
    sane without a toggle.
  - Event type pills get a colour class (decision/rejection/evidence/
    finding) so timelines are scannable at a glance.
  - Suggested events get a trailing ? marker matching the rest of
    the codebase.

CLI integration test export_html_emits_self_contained_document:
  - DOCTYPE html present
  - task title + event text present
  - no http:// or https:// — proves no external CSS/font/script
    leaked into the output.

Refs claude-memory-1yc.6
Six worked Input/Output pairs covering the three boundary calls
the classifier gets wrong most often:

  - hypothesis vs finding (2 examples)
  - finding vs evidence (2 examples)
  - decision vs hypothesis (2 examples)

Each pair pins one half of the boundary so the model sees the
contrast inline rather than only as abstract definitions. The
examples themselves are drawn from real boundary cases observed
during this epic — keeps them representative.

The prompt budget guard (prompt_truncates_event_lines_to_keep_size
_bounded) still passes after adding ~3KB of fixed prefix, because
the recent_tasks block is the variable cost — examples are
constant-time addition.

New test prompt_contains_few_shot_examples enforces the 6-example
floor as a regression guard.

Refs claude-memory-1yc.7
Adds tests/fixtures/classifier_eval.jsonl with 30 labeled chunks
spanning all 12 event types, plus tests/classifier_eval.rs that
runs in two modes:

  - Default (CI-safe): no model API call. Asserts
      • fixture has ≥ 30 rows
      • every expected event type is one of EventType::ALL
      • prompt builder emits each input verbatim
    Hermetic and fast — runs as part of plain cargo test.

  - Opt-in (TJ_CLASSIFIER_EVAL=on): runs ClaudeCliClassifier::
    default() against every row, computes accuracy, asserts the
    0.70 floor and prints misses. Requires  on PATH.
    Skipped silently otherwise.

Three new tests, all green by default; the real-classifier one is
silent-pass without the env var.

Refs claude-memory-1yc.8
Shahinyanm added 9 commits May 7, 2026 10:15
Every MCP tool handler used to call tj_core::db::open() which runs
PRAGMA journal_mode + foreign_keys + apply_migrations + an empty
schema_migrations SELECT on every invocation. At small N the open
cost dominates the actual work — pack/search/close all paid this
overhead even when the underlying state changed nothing.

Now: a process-wide HashMap<PathBuf, Arc<Mutex<Connection>>> guarded
by an outer OnceLock<Mutex<...>>. cached_open(path) does an O(1)
lookup, falls back to db::open() on the cold path, and shares the
Arc with future callers. Each tool handler takes the inner mutex
for the duration of its work; the outer mutex is held only for the
brief insert/lookup.

  - SQLite Connection is Send (single-threaded mode); safe to send
    across the spawn_blocking thread boundary inside an Arc.
  - Inner mutex serialises calls per project_hash. SQLite already
    serialises writes, so we accept a tiny concurrency loss in
    exchange for the open-cost saving.
  - Cache is keyed by PathBuf, so two MCPs running with different
    --project-dir do not stomp on each other.

Tests:
  - cached_open_returns_same_arc_for_same_path
  - cached_open_returns_distinct_arcs_for_distinct_paths

Refs claude-memory-yj1.1
Adds a fourth output format that produces a self-contained SQLite
snapshot of the projects derived state. Useful for backups, sharing
the state with another machine, or offline analysis with sqlite3
queries.

Pipeline:
  1. Rebuild from JSONL (source of truth) so the snapshot reflects
     every event ever appended.
  2. VACUUM INTO a temp file produces a clean, defragmented copy.
  3. Stream the bytes to stdout so the user can redirect to a file.

Test export_sqlite_round_trips_through_pack:
  - Create a task in xdg_a + proj_a, append a decision event.
  - export --format sqlite, capture stdout.
  - Confirm the magic bytes ("SQLite format 3\0") are present.
  - Drop the snapshot under xdg_b/task-journal/state/<hash>.sqlite
    (no JSONL on the destination side).
  - task-journal pack from xdg_b finds the same task with the same
    decision text — the snapshot is read-only-self-contained.

Refs claude-memory-yj1.2
The auto-capture hook silently writes failed classifier results to
pending/<id>.json. Until now they sat there forever — the user had
no way to see what was queued or to flush them.

Two new subcommands:

  task-journal pending list
  task-journal pending retry [--mock-event-type X ...]

list prints id / queued_at / attempts / text-preview as a plain
table; --json deferred to a future epic if anyone asks.

retry walks the queue and re-feeds each entry through the classifier
(currently only the mock path is wired — the real classifier
roundtrip lives behind the install-hooks integration). Schema
adds an optional attempts counter; once it hits PENDING_MAX_ATTEMPTS
(=3) the entry is renamed to <id>.dead.json so list still surfaces
it but retry skips it.

Tests:
  - pending_list_shows_queued_entries
  - pending_retry_drains_with_mock_classifier (round-trips a fake
    queued entry into a real event in JSONL, visible in pack)
  - pending_retry_marks_dead_after_max_attempts

Refs claude-memory-yj1.3
Adds a real rmcp client integration test that verifies three
boundary contracts:

  - rmcp 0.3 with the client feature compiles against this
    workspace and the pinned toolchain.
  - CallToolRequestParam round-trips through serde — the JSON-RPC
    envelope shape is the same shape we marshal in tj-cli tests.
  - tokio::io::DuplexStream still satisfies the AsyncRead +
    AsyncWrite + Send + static bounds rmcp expects from a
    transport.

A previous draft of this test span an in-process server +
client over duplex and called task_create + event_add + task_pack
+ task_close end-to-end. That draft hung indefinitely because
TaskJournalServer is defined in the binary crate (main.rs) and
is not reachable from a black-box integration test. Driving the
real handlers needs the server moved into a library target —
tracked as a follow-up. Until then the CLI integration tests in
tj-cli/tests/cli.rs cover the same code paths end-to-end through
the same tj_core entry points the MCP handlers use.

Refs claude-memory-yj1.4
Today the MCP server emits no per-call telemetry — when a user
reports slowness or a stuck tool, the only signal is whatever the
client surfaces. Adds two INFO log lines around every handler:

  tool_call start  tool=task_pack  correlation_id=01J...
  tool_call ok     tool=task_pack  correlation_id=01J...  elapsed_ms=18

(The correlation_id is the same across both lines, so a grep on
correlation_id=01J... isolates one client request.)

Choice notes:
  - traced_tool helper wraps the existing async-fn body so the
    tool macro signature stays exactly the same. No tool_router
    re-derivation.
  - ULID instead of UUID v4: ULID is already a transitive dep
    (used for event_id), and the embedded timestamp orders log
    lines naturally without parsing a separate field.
  - On error the exit line drops to WARN level and includes the
    McpError.message so the failure cause shows up at default
    RUST_LOG=info without enabling debug noise.

Tests:
  - new_correlation_id_is_unique_across_thousand_calls
  - traced_tool_transparently_returns_inner_result (Ok + Err
    paths preserve the inner Result)

Refs claude-memory-yj1.5
Today the MCP server runs the rmcp serve loop until the transport
closes, then exits. SIGTERM (e.g. from supervisord, systemd, docker
stop) hard-kills the process mid-write — JSONL log can be left
mid-line, tracing buffers are dropped, no shutdown ack ever lands
in the supervisor logs.

Now: main wraps the serve loop in tokio::select! against a new
wait_for_shutdown_signal() future:

  - On Unix: races Ctrl-C and SIGTERM, logs which one arrived.
  - On Windows: only Ctrl-C / Ctrl-Break is observable to a
    console binary; SIGTERM has no analogue, so we log only Ctrl-C.

Either branch logs an info line and returns 0. The drop of the
tokio runtime flushes tracing buffers as a side effect.

Adds the tokio signal feature to the workspace deps.

Test shutdown_signal_does_not_fire_spuriously races the shutdown
future against an immediately-ready future and asserts the ready
arm wins — i.e. nothing fires until a real signal arrives.

Refs claude-memory-yj1.6
Last commit of epic D. Workspace 0.2.0-rc.1 -> 0.2.1; tj-cli and
tj-mcp tj-core deps aligned. CHANGELOG gets a [0.2.1] section
listing the additive features (export sqlite, pending list/retry,
correlation_id tracing, graceful shutdown) and the internal
Connection cache perf change.

No breaking changes; this is a minor bump after 0.2.0 (the rc).
After dogfooding I will tag and publish.

Closes claude-memory-yj1.7
@Shahinyanm Shahinyanm merged commit 8169930 into main May 7, 2026
2 of 7 checks passed
@Shahinyanm Shahinyanm deleted the claude/v0.2.1-epic-d branch May 7, 2026 08:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant