feat: complete judge-only by default, --enrich opt-in + enrich hardening (0.23.0)#41
Merged
Merged
Conversation
The backfill model sometimes answers with prose instead of the JSON
array — e.g. continuing the transcript's own dialogue ('Контекст в
норме... Что дальше?'). The parse error aborted the whole `complete`,
losing the retitle and close. Backfill is best-effort: skip an
unparseable chunk reply (warn), extract a JSON array even when wrapped
in prose, and re-assert 'output ONLY the JSON array, do not continue the
transcript' after the transcript. Retitle/close run regardless of what
enrich recovers.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
claude -p is a full Claude Code instance: its system prompt + tool definitions cost ~113k tokens before our content, so a 360k-char chunk (~91k tokens) still 400'd at ~204k total. Drop TRANSCRIPT_CHAR_BUDGET to 150k chars (~37k tokens) and make backfill swallow ANY per-chunk error (over-budget 400, transient failure, non-JSON reply) — enrich is strictly best-effort and never sinks the retitle/close. A truly broken backend still surfaces at the judge step. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A big task makes many sequential claude -p calls (one+ per session); without a timeout one wedged call hung the whole complete, with no output so it looked dead. Add a per-call wall-clock timeout (90s, TJ_CLAUDE_TIMEOUT_SECS) that kills a stuck claude and drains pipes in threads to avoid buffer deadlock; a timed-out chunk is skipped (enrich is best-effort). Print an 'enriching N session(s)…' progress line so a multi-minute run is legible, and point at --quick. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…, progress) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Running complete on real 12-session tasks showed the session-backfill pass takes 10-15 min (dozens of sequential claude -p spawns, ~113k-token overhead each) — too slow to be the default. The judge-only path (retitle + close + outcome) is seconds and delivers ~90% of the value. Flip the default: complete <id> now finalizes via judgment only; add --enrich to also backfill missed events from sessions. The old --quick flag is removed (its behaviour is the default). Behaviour change → 0.23.0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Member
Author
|
Updated: this PR now also flips the default — |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
After the chunking fix (0.22.1),
completehit a different failure in production:The backfill model replied with prose instead of the JSON array — it continued the transcript's own dialogue. The parse error propagated up and aborted the entire
complete, throwing away the retitle and close too.Fix
Backfill is best-effort and must never sink the finalize:
tracing::warn) and contributes no events, instead of erroring out. The surrounding retitle/close still run.parse_backfill_jsonslices to the outer[...], tolerating a JSON array wrapped in surrounding text.Tests
parse_extracts_array_wrapped_in_prose,parse_errors_on_pure_prose,backfill_skips_unparseable_chunk_reply(the exact prod reply), plus the existing chunking tests.fmt --check,clippy --workspace --all-targets -D warnings,test --workspace, lean build.🤖 Generated with Claude Code