Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 40 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,46 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

## [0.23.0] - 2026-06-13

Finalize, retuned after running `complete` on real 12-session tasks: the fast,
reliable judge-only path is now the default, and the slow session-enrich pass is
opt-in.

### Changed
- **`complete` is judge-only by default; enrich is opt-in via `--enrich`.**
Finalizing through the model's judgment (retitle + close + outcome) takes
seconds and is what gives ~90% of the value. The session-backfill pass — one
`claude -p` call per session, minutes on a big multi-session task — proved too
slow to be the default, so it now runs only with `--enrich`. (The old `--quick`
flag is gone: its behaviour is the default. Replace `complete <id> --quick`
with `complete <id>`, and `complete <id>` with `complete <id> --enrich` if you
want the old full behaviour.)

### Fixed
- **`complete` survives a non-JSON enrich reply.** When the backfill model
answered with prose instead of the requested JSON array — e.g. continuing the
transcript's own dialogue ("Контекст в норме… Что дальше?") — the parse error
aborted the whole `complete`, losing the retitle and close. Backfill now skips
an unparseable chunk reply (with a warning), the parser extracts a JSON array
even when wrapped in prose, and the prompt re-asserts "output ONLY the JSON
array, do not continue the transcript" after the transcript.
- **Enrich chunks are sized for `claude -p`'s overhead.** `claude -p` is a full
Claude Code instance whose system prompt + tool definitions cost ~113k tokens
before our content, so the earlier 360k-char chunk still 400'd at ~204k total.
The per-call transcript budget drops to 150k chars (~37k tokens), and **any**
per-chunk failure (over-budget 400, transient error, non-JSON) is skipped
rather than aborting — a genuinely broken backend still surfaces at the judge
step.
- **No more apparent hang.** A big task makes many sequential `claude -p` calls;
without a timeout one wedged call hung the whole command with no output. Each
call now has a wall-clock timeout (90s, `TJ_CLAUDE_TIMEOUT_SECS`) that kills a
stuck `claude` (pipes drained in threads to avoid buffer deadlock), and enrich
prints an "enriching N session(s)…" progress line pointing at `--quick`.
- **Legible `claude -p` errors** (carried from the same investigation): a
non-zero exit now surfaces the JSON error claude prints on stdout, so failures
read as "Prompt is too long · ~204261 tokens" instead of a bare "exit 1".

## [0.22.1] - 2026-06-13

### Fixed
Expand Down
6 changes: 3 additions & 3 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ members = [
]

[workspace.package]
version = "0.22.1"
version = "0.23.0"
edition = "2021"
rust-version = "1.88"
license = "MIT"
Expand Down
2 changes: 1 addition & 1 deletion crates/tj-cli/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ default = ["embed"]
embed = ["tj-core/embed"]

[dependencies]
tj-core = { package = "task-journal-core", version = "0.22.1", path = "../tj-core", default-features = false }
tj-core = { package = "task-journal-core", version = "0.23.0", path = "../tj-core", default-features = false }
anyhow = { workspace = true }
clap = { workspace = true }
tracing = { workspace = true }
Expand Down
54 changes: 34 additions & 20 deletions crates/tj-cli/src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -871,21 +871,22 @@ enum Commands {
#[arg(long)]
backend: Option<String>,
},
/// Finalize a task: enrich its memory from the sessions it touched, fix a
/// junk auto-title, and close it IF the events clearly show it is done —
/// the model decides from the content. Omit the id to finalize every open
/// task in the project (batch, with a reviewable list). One LLM call per
/// session for enrich + one judge call per task, via the chosen backend
/// (free with `--backend ollama`).
/// Finalize a task: fix a junk auto-title and close it IF the events
/// clearly show it is done — the model decides from the content, in
/// seconds. Omit the id to finalize every open task (batch, with a
/// reviewable list). Add `--enrich` to also re-read the task's sessions and
/// backfill missed events first — thorough but slow (one `claude -p` call
/// per session; minutes on a big multi-session task).
Complete {
/// The task id to finalize. Omit to finalize all open tasks (batch).
task: Option<String>,
/// Show scope and planned actions without calling the model or writing.
#[arg(long)]
dry_run: bool,
/// Skip the (heavy) enrich pass; judge/retitle/close from stored events only.
/// Also backfill missed events from the task's sessions before judging.
/// Thorough but slow (one `claude -p` call per session).
#[arg(long)]
quick: bool,
enrich: bool,
/// Required for batch finalize when stdin is not an interactive terminal.
#[arg(long)]
yes: bool,
Expand Down Expand Up @@ -2784,12 +2785,12 @@ fn main() -> Result<()> {
Commands::Complete {
task,
dry_run,
quick,
enrich,
yes,
backend,
} => match task {
Some(id) => run_complete_single(&id, dry_run, quick, backend.as_deref())?,
None => run_complete_batch(dry_run, quick, yes, backend.as_deref())?,
Some(id) => run_complete_single(&id, dry_run, enrich, backend.as_deref())?,
None => run_complete_batch(dry_run, enrich, yes, backend.as_deref())?,
},
Commands::Export {
format,
Expand Down Expand Up @@ -4153,6 +4154,14 @@ fn enrich_task(
if sessions.is_empty() {
return Ok(0);
}
// Enrich is the slow part — one (or more, for big transcripts) `claude -p`
// call per session. Announce it so a multi-minute run doesn't look hung;
// `--quick` skips this entirely.
eprintln!(
"complete: enriching {} session(s) via {} — can take a few minutes (or use --quick to skip)…",
sessions.len(),
llm.name()
);
let run_id = ulid::Ulid::new().to_string();
let dream_backend = tj_core::dream::llm_backend::LlmDreamBackend::new(llm);
let opts = tj_core::dream::DreamOptions {
Expand Down Expand Up @@ -4206,7 +4215,7 @@ fn task_event_lines(conn: &rusqlite::Connection, task_id: &str) -> anyhow::Resul
fn finalize_one_task(
ctx: &ProjectCtx<'_>,
task_id: &str,
quick: bool,
enrich: bool,
dry_run: bool,
backend: Option<&str>,
) -> anyhow::Result<FinalizeOutcome> {
Expand All @@ -4215,8 +4224,9 @@ fn finalize_one_task(
let events_path = ctx.events_path;
let project_hash = ctx.project_hash;

// 1. Enrich (unless quick / dry-run) — needs sessions and a backend.
if !quick && !dry_run {
// 1. Enrich (only when asked, and not on a dry-run) — needs sessions and a
// backend. Off by default because it is slow (one claude -p per session).
if enrich && !dry_run {
if let Some(dir) = ctx.project_dir {
if let Some(llm) = tj_core::llm::backend_from_env(backend)? {
out.enriched = enrich_task(conn, events_path, project_hash, dir, task_id, llm)?;
Expand Down Expand Up @@ -4331,7 +4341,7 @@ PATH; or pick one via --backend / TJ_BACKEND: anthropic, openai, ollama (free, l
fn run_complete_single(
task_id: &str,
dry_run: bool,
quick: bool,
enrich: bool,
backend: Option<&str>,
) -> anyhow::Result<()> {
let cwd = std::env::current_dir()?;
Expand All @@ -4352,7 +4362,7 @@ fn run_complete_single(
project_hash: &project_hash,
project_dir: project_dir.as_deref(),
};
let out = finalize_one_task(&ctx, task_id, quick, dry_run, backend)?;
let out = finalize_one_task(&ctx, task_id, enrich, dry_run, backend)?;
print_finalize_outcome(task_id, &out);
Ok(())
}
Expand All @@ -4361,7 +4371,7 @@ fn run_complete_single(
/// user can prune before confirming. Refuses without a TTY unless `--yes`.
fn run_complete_batch(
dry_run: bool,
quick: bool,
enrich: bool,
yes: bool,
backend: Option<&str>,
) -> anyhow::Result<()> {
Expand Down Expand Up @@ -4417,7 +4427,7 @@ fn run_complete_batch(
if dry_run {
println!();
for (id, _) in &open {
finalize_one_task(&ctx, id, quick, true, backend)?;
finalize_one_task(&ctx, id, enrich, true, backend)?;
}
return Ok(());
}
Expand Down Expand Up @@ -4457,7 +4467,11 @@ fn run_complete_batch(
println!(
"\nWill finalize {} task(s){}. Proceed? [y/N]",
targets.len(),
if quick { " (quick: no enrich)" } else { "" }
if enrich {
" (with --enrich: slow, reads sessions)"
} else {
""
}
);
let mut buf = String::new();
std::io::stdin().read_line(&mut buf)?;
Expand All @@ -4469,7 +4483,7 @@ fn run_complete_batch(

let mut left_open: Vec<(String, String)> = Vec::new();
for (id, _) in &targets {
let out = finalize_one_task(&ctx, id, quick, false, backend)?;
let out = finalize_one_task(&ctx, id, enrich, false, backend)?;
print_finalize_outcome(id, &out);
if out.skipped_no_backend {
println!("complete: stopping batch — no LLM backend available.");
Expand Down
8 changes: 4 additions & 4 deletions crates/tj-cli/tests/cli.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5549,10 +5549,10 @@ fn complete_batch_dry_run_lists_open_tasks() {
/// `claude` on PATH returning a canned judgment. Proves the wiring: junk
/// title → Rename, done verdict → Close with a persisted outcome. Unix-only
/// (shell-script stub); the logic itself is covered cross-platform by the
/// finalize.rs unit tests.
/// finalize.rs unit tests. Default mode (judge-only, no `--enrich`).
#[cfg(unix)]
#[test]
fn complete_quick_retitles_and_closes_via_fake_backend() {
fn complete_retitles_and_closes_via_fake_backend() {
use std::os::unix::fs::PermissionsExt;

let dir = assert_fs::TempDir::new().unwrap();
Expand Down Expand Up @@ -5609,14 +5609,14 @@ fn complete_quick_retitles_and_closes_via_fake_backend() {
.trim()
.to_string();

// --quick: skip enrich (no sessions), exercise judge → retitle → close.
// Default mode (judge-only): exercise judge → retitle → close.
Command::cargo_bin("task-journal")
.unwrap()
.current_dir(proj.path())
.env("XDG_DATA_HOME", dir.path())
.env("PATH", &path_env)
.env_remove("ANTHROPIC_API_KEY")
.args(["complete", &task_id, "--quick"])
.args(["complete", &task_id])
.assert()
.success()
.stdout(contains("retitled"))
Expand Down
67 changes: 62 additions & 5 deletions crates/tj-core/src/classifier/agent_sdk.rs
Original file line number Diff line number Diff line change
Expand Up @@ -93,12 +93,71 @@ fn claude_exit_error(
anyhow!("`claude -p` exited with {status}: {detail}")
}

/// Per-call wall-clock ceiling for a `claude -p` invocation. A spawned full
/// Claude Code instance normally answers in seconds; this kills a wedged one so
/// a multi-chunk enrich can't hang the whole `complete`. Override with
/// `TJ_CLAUDE_TIMEOUT_SECS`.
fn claude_timeout() -> std::time::Duration {
let secs = std::env::var("TJ_CLAUDE_TIMEOUT_SECS")
.ok()
.and_then(|s| s.parse::<u64>().ok())
.filter(|n| *n > 0)
.unwrap_or(90);
std::time::Duration::from_secs(secs)
}

/// Wait for `child` up to `timeout`, draining stdout/stderr concurrently so a
/// full pipe can't deadlock the wait. On timeout the child is killed and an
/// error returned; otherwise the captured output is handed back.
fn wait_with_timeout(
mut child: std::process::Child,
timeout: std::time::Duration,
) -> anyhow::Result<std::process::Output> {
use std::io::Read;
let mut out_pipe = child.stdout.take();
let mut err_pipe = child.stderr.take();
let so = std::thread::spawn(move || {
let mut b = Vec::new();
if let Some(p) = out_pipe.as_mut() {
let _ = p.read_to_end(&mut b);
}
b
});
let se = std::thread::spawn(move || {
let mut b = Vec::new();
if let Some(p) = err_pipe.as_mut() {
let _ = p.read_to_end(&mut b);
}
b
});
let start = std::time::Instant::now();
let status = loop {
if let Some(status) = child.try_wait()? {
break status;
}
if start.elapsed() >= timeout {
let _ = child.kill();
let _ = child.wait();
anyhow::bail!("`claude -p` timed out after {}s", timeout.as_secs());
}
std::thread::sleep(std::time::Duration::from_millis(150));
};
Ok(std::process::Output {
status,
stdout: so.join().unwrap_or_default(),
stderr: se.join().unwrap_or_default(),
})
}

impl CommandRunner for ClaudeBinaryRunner {
fn run(&self, model: &str, prompt: &str) -> anyhow::Result<String> {
let output = base_claude_command(model)
let child = base_claude_command(model)
.arg(prompt)
.output()
.stdout(std::process::Stdio::piped())
.stderr(std::process::Stdio::piped())
.spawn()
.context("failed to spawn `claude` (is Claude Code installed and on PATH?)")?;
let output = wait_with_timeout(child, claude_timeout())?;
if !output.status.success() {
return Err(claude_exit_error(
output.status,
Expand Down Expand Up @@ -135,9 +194,7 @@ impl CommandRunner for ClaudeBinaryStdinRunner {
.context("claude stdin was not captured")?
.write_all(prompt.as_bytes())
.context("failed to write prompt to claude stdin")?;
let output = child
.wait_with_output()
.context("failed to wait for `claude`")?;
let output = wait_with_timeout(child, claude_timeout())?;
if !output.status.success() {
return Err(claude_exit_error(
output.status,
Expand Down
Loading
Loading