Skip to content

Add --llm-reduce: one LLM prompt over all batch-transcribe results#179

Merged
alexkroman merged 12 commits into
mainfrom
feat/llm-reduce
Jun 16, 2026
Merged

Add --llm-reduce: one LLM prompt over all batch-transcribe results#179
alexkroman merged 12 commits into
mainfrom
feat/llm-reduce

Conversation

@alexkroman

Copy link
Copy Markdown
Collaborator

Summary

Adds a repeatable --llm-reduce 'PROMPT' flag to assembly transcribe — a map-reduce for batch transcription. --llm is the per-source map (runs over each transcript); --llm-reduce is the reduce (one LLM-Gateway call over all sources' results).

assembly transcribe --from-stdin --concurrency 3 --speaker-labels \
  --llm 'Judge diarization quality; output JSON {speaker_count, issues, score}' \
  --llm-reduce 'Rank these videos worst-to-best and summarize the failure modes'

Behavior

  • Batch mode: after every source transcribes, each source contributes its last --llm output (or its transcript text if no --llm ran), concatenated under ### Source: headers, and one reduce chain runs over the combined text. The result prints to stdout; the progress table is routed to stderr so stdout stays pipe-clean. Under --json, a final additive {"type":"reduce","model","prompts","output"} NDJSON record follows the per-source result records.
  • Single-source mode: nothing to aggregate, so the reduce prompts extend the --llm chain over the one transcript.
  • A reduce only runs after the batch fully succeeds (_summarize raises on any failure first), and is skipped — with a stderr warning — when there is nothing to reduce (avoids a billable call over empty input).

Touches only transcribe's own modules (commands/transcribe.py, app/transcribe/{run,batch}.py) plus docs/snapshot — no shared-file edits.

Test Plan

  • ./scripts/check.sh green: lint, mypy/pyright, 100% patch coverage, diff-scoped mutation gate, build + twine
  • New tests/test_transcribe_reduce.py: flag plumbing, single-source chain, batch reduce (map-output + transcript-text fallback), {"type":"reduce"} NDJSON, stdout/stderr routing, empty-reduce skip, helper edge branches
  • transcribe --help snapshot regenerated (incl. the t alias)
  • REFERENCE.md + README documented; docs-consistency gate passes
  • Final code review addressed (empty-combined-text guard added)

Design spec and implementation plan: docs/superpowers/specs/2026-06-16-llm-reduce-batch-transcribe-design.md, docs/superpowers/plans/2026-06-16-llm-reduce.md.

🤖 Generated with Claude Code

alexkroman-assembly and others added 12 commits June 16, 2026 08:02
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Kills a mutation-gate survivor: the default was dead since run_batch always
passes reduce_active.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Avoids a billable LLM-Gateway call (and junk on stdout) when every source's
transcript text and --llm output are empty; warns on stderr instead. Closes the
empty-combined-text gap from final review.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
"Nothing to reduce: no transcript text across sources.", json_mode=json_mode
)
return
result = llm.run_chain(

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The llm.run_chain call receives transcript_text assembled from user transcripts; do not send unsanitized user transcripts to external services without sanitization or explicit opt-in.

Details

✨ AI Reasoning
​The new _run_reduce calls llm.run_chain(api_key, transform.reduce_prompts, transcript_text=combined, ...). The combined string is constructed from user transcripts (see _gather_reduce_inputs) and is passed unchanged as transcript_text to an external LLM-Gateway. This is effectively logging/exfiltrating user-controlled content to a third party and may leak PII or otherwise sensitive data if not sanitized or consented to.

🔧 How do I fix it?
Keep sensitive data such as emails, passwords, and tokens out of logs. When logging values tied to a user, prefer a safe identifier like a user ID over the raw input, and strip line breaks from any user-provided text you do log.

Reply @AikidoSec feedback: [FEEDBACK] to get better review comments in the future.
Reply @AikidoSec ignore: [REASON] to ignore this issue.
More info

@alexkroman alexkroman added this pull request to the merge queue Jun 16, 2026
Merged via the queue into main with commit 8ab9970 Jun 16, 2026
19 checks passed
@alexkroman alexkroman deleted the feat/llm-reduce branch June 16, 2026 17:01
alexkroman pushed a commit that referenced this pull request Jun 16, 2026
Mirrors the map-reduce LLM vocabulary `transcribe` gained in #179, completing
the pipeline `transcripts list | transcripts get` so fetched transcripts can be
summarized or aggregated without a second tool:

    # map: summarize each transcript in a piped list
    assembly transcripts list --json | assembly transcripts get --llm "Summarize this call"

    # reduce: one ranking across all of them
    assembly transcripts list --json | assembly transcripts get --llm-reduce "Rank these worst-to-best"

`--llm` runs a per-transcript chain (server-injected by id via
`llm.run_chain_steps`); `--llm-reduce` runs one chain over all fetched
transcripts (`llm.run_chain`), emitting the same additive
`{"type":"reduce",...}` NDJSON record transcribe does. A single positional id
folds the reduce prompts into the `--llm` chain (nothing to aggregate), matching
transcribe's single-source behavior. Human reduce keeps stdout clean by
suppressing the per-transcript output; --json keeps the per-id stream and
appends the reduce record. Reuses core/llm.py and transcribe's
render_transform_steps — no new engine.

The get/list pipeline tests move to tests/test_transcripts_pipeline.py to keep
both test modules under the 500-line gate.

https://claude.ai/code/session_01GdpgpKDUJCNvECP5ttowst
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants