Skip to content

feat: complete judge-only by default, --enrich opt-in + enrich hardening (0.23.0)#41

Merged
Shahinyanm merged 5 commits into
mainfrom
fix/enrich-nonjson-resilience
Jun 13, 2026
Merged

feat: complete judge-only by default, --enrich opt-in + enrich hardening (0.23.0)#41
Shahinyanm merged 5 commits into
mainfrom
fix/enrich-nonjson-resilience

Conversation

@Shahinyanm

Copy link
Copy Markdown
Member

Problem

After the chunking fix (0.22.1), complete hit a different failure in production:

Error: dream JSON parse failed; got: Контекст в норме. 566.5k/1M (57%) использовано... Что дальше?

The backfill model replied with prose instead of the JSON array — it continued the transcript's own dialogue. The parse error propagated up and aborted the entire complete, throwing away the retitle and close too.

Fix

Backfill is best-effort and must never sink the finalize:

  • Skip unparseable chunks. A chunk whose reply isn't a JSON array is logged (tracing::warn) and contributes no events, instead of erroring out. The surrounding retitle/close still run.
  • Extract array from prose. parse_backfill_json slices to the outer [...], tolerating a JSON array wrapped in surrounding text.
  • Re-assert the contract. The prompt now repeats "output ONLY the JSON array … do not continue the transcript" after the transcript (recency), reducing the chance the model chats back.

Tests

  • parse_extracts_array_wrapped_in_prose, parse_errors_on_pure_prose, backfill_skips_unparseable_chunk_reply (the exact prod reply), plus the existing chunking tests.
  • Full local gate green: fmt --check, clippy --workspace --all-targets -D warnings, test --workspace, lean build.

🤖 Generated with Claude Code

Shahinyanm and others added 5 commits June 13, 2026 20:23
The backfill model sometimes answers with prose instead of the JSON
array — e.g. continuing the transcript's own dialogue ('Контекст в
норме... Что дальше?'). The parse error aborted the whole `complete`,
losing the retitle and close. Backfill is best-effort: skip an
unparseable chunk reply (warn), extract a JSON array even when wrapped
in prose, and re-assert 'output ONLY the JSON array, do not continue the
transcript' after the transcript. Retitle/close run regardless of what
enrich recovers.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
claude -p is a full Claude Code instance: its system prompt + tool
definitions cost ~113k tokens before our content, so a 360k-char chunk
(~91k tokens) still 400'd at ~204k total. Drop TRANSCRIPT_CHAR_BUDGET to
150k chars (~37k tokens) and make backfill swallow ANY per-chunk error
(over-budget 400, transient failure, non-JSON reply) — enrich is strictly
best-effort and never sinks the retitle/close. A truly broken backend
still surfaces at the judge step.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A big task makes many sequential claude -p calls (one+ per session);
without a timeout one wedged call hung the whole complete, with no
output so it looked dead. Add a per-call wall-clock timeout (90s,
TJ_CLAUDE_TIMEOUT_SECS) that kills a stuck claude and drains pipes in
threads to avoid buffer deadlock; a timed-out chunk is skipped (enrich is
best-effort). Print an 'enriching N session(s)…' progress line so a
multi-minute run is legible, and point at --quick.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…, progress)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Running complete on real 12-session tasks showed the session-backfill
pass takes 10-15 min (dozens of sequential claude -p spawns, ~113k-token
overhead each) — too slow to be the default. The judge-only path
(retitle + close + outcome) is seconds and delivers ~90% of the value.

Flip the default: complete <id> now finalizes via judgment only; add
--enrich to also backfill missed events from sessions. The old --quick
flag is removed (its behaviour is the default). Behaviour change → 0.23.0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@Shahinyanm Shahinyanm changed the title fix: complete tolerates non-JSON enrich replies (0.22.2) feat: complete judge-only by default, --enrich opt-in + enrich hardening (0.23.0) Jun 13, 2026
@Shahinyanm

Copy link
Copy Markdown
Member Author

Updated: this PR now also flips the defaultcomplete is judge-only (retitle + close + outcome, seconds) by default; the slow session-enrich is opt-in via --enrich. Found by running it on a real 12-session task where full enrich took 10-15 min. Bumped to 0.23.0 (behaviour change). The hardening commits (chunk sizing, per-call timeout, best-effort skip, progress, legible errors) remain.

@Shahinyanm Shahinyanm merged commit 2d7171e into main Jun 13, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant