feat(ingest): paste handler + Whisper env-config + paste detection#7
Merged
Conversation
Three coherent, long-pending ingest improvements that grew from the original session work, plus the matching tests. - lib/ingest/paste.ts (+ paste.test.ts) — ingest a local plain-text/markdown file as a "paste" source. Mirrors the pdf.ts pattern: copy the raw, derive the title from the first H1 or filename, write a "paste" source file via generateSourceId/slugify. No `url` field (paste sources have no scrapeable origin; gray-matter can't serialize undefined). - lib/ingest/detect.ts (+ detect.test.ts) — detect local text/markdown files as type "paste" so `npm run ingest <file.txt>` routes through the new paste handler instead of erroring out. - ingest.ts — wire the paste handler into the dispatch. - lib/ingest/audio.ts — Whisper invocation is now ENV-CONFIGURABLE (ZUHN_WHISPER_MODEL, ZUHN_WHISPER_TASK, ZUHN_WHISPER_LANGUAGE) and auto-detects language by default. Fixes a real bug: the previous hardcoded `--model base --language en` confabulated on non-English audio (forcing English on the weakest model produced fluent hallucination + repetition loops on Korean audio). Carries the KB_ROOT-import refactor for these 4 files; the same refactor across the OTHER 39 scripts is in PR #6 — splitting because these 4 also have substantive non-refactor changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three long-pending ingest improvements (each useful, all sitting uncommitted) shipped together as one coherent unit:
lib/ingest/paste.ts(+ test) letsnpm run ingest <file.txt>work on local text/markdown. Mirrors thepdf.tspattern (copy raw, derive title from first H1 or filename, write apastesource).lib/ingest/detect.ts(+ test) routes local text/markdown to the paste handler instead of erroring.lib/ingest/audio.tsbecomes configurable (ZUHN_WHISPER_MODEL,ZUHN_WHISPER_TASK,ZUHN_WHISPER_LANGUAGE) and auto-detects language by default. Fixes a real bug: the previous hardcoded--model base --language enconfabulated on non-English audio (forcing English on the weakest model produced fluent hallucination on Korean audio).Note on overlap with PR #6
These 4 files also carry the
KB_ROOT-import swap. PR #6 (the 39-file mechanical refactor) deliberately excluded them because they have substantive non-refactor changes that belong in their own PR. Independent diffs, no conflict.Test plan
🤖 Generated with Claude Code