Canonical repository: WalksWithASwagger/spektorAI.
License posture: no open-source license is currently granted. Do not treat the
archived WalksWithASwagger/whisperforge MIT license as inherited by this repo.
Turn spoken thoughts into polished, publishable content. Upload audio, and WhisperForge transcribes it, extracts wisdom, drafts an outline and article, generates social posts + image prompts, and ships the bundle to a structured Notion page — all tuned to your own voice via a per-user prompt and knowledge base.
Legacy audio repositories are being archived as historical pointers after the May 2026 consolidation audit. Treat this repo as the source of truth for WhisperForge implementation, roadmap, and issue work.
Audio in
- MP3 / WAV / OGG / M4A upload OR record live in the browser via
st.audio_input— no external recorder needed. - Four transcription backends: OpenAI cloud Whisper (
gpt-4o-mini-transcribedefault), MLX Whisper (Apple Silicon native), whisper.cpp (CPU/Metal), WhisperX with word-level timestamps + optional pyannote speaker diarization. Seedocs/TRANSCRIPTION-PROVIDER-MATRIX-2026-05-18.mdbefore adding another provider or changing defaults. - Silero VAD-based chunker (opt-in) cuts on silences instead of byte size.
Pipeline
- Three LLM provider lanes: OpenAI, Anthropic, or local Ollama (auto-discovers installed models at UI load).
- Default: Claude Haiku 4.5 — fast, cheap, in your voice.
- Stages: transcript cleanup → chapters → wisdom → outline → social → image prompts → article → optional critique/revise → optional fact-check → optional image generation.
- Agentic drafting: draft → critique → revise for publishable long-form.
- Fact-check pass: flags article claims not grounded in the transcript.
- Chapterization: topical segmentation with
[M:SS]timestamps when WhisperX is the backend; JSON-schema-enforced output. - Anthropic prompt caching on the knowledge-base prefix — ~67% input- token cost reduction per 5-stage run.
- Per-user voice: prompts + knowledge base live under
prompts/<user>/and are injected into the system prompt on every call. - Text-input mode: paste prose instead of uploading audio and run the same pipeline (useful for imported transcripts).
- Capture inbox: Wispr Flow dictation, pasted notes, uploads, and browser
recordings become durable capture records under
.cache/captures/, then link to run artifacts and exports. - KB health + retrieval inspection: the Knowledge Base dialog inventories files, flags stale/duplicate/private-looking signals, and can show the exact RAG chunks, scores, and voice anchors selected for a sample query.
- Recipe command palette: choose reusable workflows such as article with receipts, brief/social pack, or issue handoff; recipes apply saved defaults while preserving the manual controls.
- Composition review: the Output card includes a Review tab where the draft sits beside source receipts, excerpts, claim flags, revision notes, compare variants, and persona variants before export.
- Run story timeline: the Review tab summarizes capture, transcription, KB context, composition, review, export, and handoff state so collaborators can see what happened without spelunking run artifacts.
- Advisory scorecards: deterministic, credential-free checks summarize voice, grounding, usefulness, recipe compliance, and handoff readiness without blocking saves.
- Agent handoff drafts: the Review tab can turn a capture, transcript, or selected output into a GitHub/Linear-ready issue brief, persist it under the current run artifacts, and route it to GitHub/Linear or a local follow-up queue after explicit approval. If external config is missing, approval stays dry-run and surfaces the reason.
- Run workspace: the Runs dialog reads
.cache/runsmanifests, flags partial/error runs, and can reopen completed outputs so Markdown, Notion, or handoff exports can be retried without regenerating content. - Resurfacing digest:
make digestgenerates a local report-only digest of notable captures, unresolved follow-ups, strong outputs, stale drafts, and reusable source nuggets. - SongForge creative lane: the SongForge recipe turns a capture plus selected KB context into original lyrics, a spoken-word variant, a service-agnostic music prompt pack, and source notes.
Image generation
- Google Nano Banana (Gemini 2.5 Flash Image) turns the
image_promptsstage output into real PNGs — ~$0.039/image flash, ~$0.10/image pro. - Four style presets (
kk,hopecode,bcai,upgrade) with brand-specific prompt suffixes; seestyles/image_styles.yaml. - Aspect-ratio presets (16:9, 1:1, 9:16) for LinkedIn, Instagram, Stories.
Output
- Structured Notion export: collapsible toggles per section, color-coded,
AI-generated title + tags + summary, Chapters with
[M:SS]jump prefixes, Revision Notes + Fact Check toggles when agentic/fact-check ran. - Markdown + Notion scorecards: advisory verdicts travel with saved/exported runs and show up in Recent Runs.
- Cost tracker (session total + cache savings) and Run history (clickable list of recent Notion pages) in the sidebar.
Deployment
- Single-process monolith for local use.
- Microservices via
docker compose— same UI, FastAPI workers wrap the same shared-logic package.
Three zones: header, sidebar + main stream, fixed bottom status bar.
┌──────────────────────────────────────────────────────────────────┐
│ WhisperForge // Control_Center Wed 19 Apr · 17:00 │
├──────────┬───────────────────────────────────────────────────────┤
│ PROFILE │ ┌─ Input ─────────────────────────────────────────┐ │
│ KK ▾ │ │ [📂 Upload] [🎙 Record] [✎ Paste] │ │
│ │ └─────────────────────────────────────────────────┘ │
│ PROVIDER │ ┌─ Pipeline ──────────────────────────────────────┐ │
│ Haiku ▾ │ │ ● Transcribe ● Clean ● Wisdom ○ Article ○ ... │ │
│ │ │ Streaming status logs (st.status) │ │
│ ⚙ More │ └─────────────────────────────────────────────────┘ │
│ 📜 Runs │ ┌─ Output ────────────────────────────────────────┐ │
│ ✎ Prompts│ │ 📝 Article · 🎯 Wisdom · 🗺 Outline · 🖼 Images │ │
│ 📚 KB │ │ 👍👎 feedback per section · 📤 Save to Notion │ │
│ │ └─────────────────────────────────────────────────┘ │
├──────────┴───────────────────────────────────────────────────────┤
│ Cost $0.12 · Calls 4 · Cache saved $0.03 · Claude Haiku 4.5 │
└──────────────────────────────────────────────────────────────────┘
All heavy trees (Prompt editor, Knowledge Base manager, Run history) live
in @st.dialog modals triggered from the sidebar — the sidebar itself
stays compact. Generation Settings (cleanup, chapters, agentic, fact-check,
images + style/ratio/model) live in the ⚙ More popover.
The bottom bar polls cost + cache savings every 2 s via @st.fragment
without rerunning the pipeline.
whisperforge_core/ pure-logic package (no Streamlit)
├── config.py env, LLM catalog, defaults
├── logging.py logger setup
├── cache.py file-hash pickle cache (sha256 + model + prompt)
├── prompts.py user/KB discovery, override precedence
├── audio.py chunking + Whisper transcription
├── llm.py unified generate() for OpenAI/Anthropic/Ollama
├── notion.py ContentBundle + 1900-char block chunker
├── pipeline.py orchestration with progress callback
├── run_story.py human-readable run timeline summaries
├── adapters.py Local{Transcriber,Processor,Storage}
└── http_adapters.py Http{Transcriber,Processor,Storage}
app.py Streamlit UI shell (imports whisperforge_core)
ui/output.py output tabs, export bar, bundle/save orchestration
ui/review.py Review tab evidence, run story, scorecards, handoffs
styles.py all CSS
whisperforge.py 45-line CLI wrapper for transcription only
services/
├── transcription/service.py POST /transcribe → audio.transcribe_audio_detailed
├── processing/service.py POST /generate, /pipeline → llm / pipeline
├── storage/service.py POST /save → notion.create_page
└── frontend/Dockerfile builds root app.py with DEPLOY_MODE=services
shared/ cross-service config + X-API-Key auth
tests/ 302 tests + health/rendered UI smokes
prompts/<user>/ profile.yaml, prompt .md files, knowledge_base,
personas, custom_prompts
The monolith (streamlit run app.py) imports whisperforge_core directly. In
services mode (docker compose up), the same app.py runs inside a frontend
container and talks to the three FastAPI services over HTTP via
http_adapters. Swap by setting DEPLOY_MODE=direct (default) or services.
Direct mode is the primary product surface. Services mode now shares the modern
transcription/processing/storage payload contract, including optional
timestamped segments and language fields when the backend emits rich
transcription details.
- Python 3.10+ (tested on 3.11 and 3.13;
audioop-ltsshim pinned for 3.13) ffmpegonPATH(brew install ffmpegon macOS)- OpenAI and Notion API keys; Anthropic key optional but recommended
git clone <your-fork-url> spektorAI
cd spektorAI
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txtOPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-...
NOTION_API_KEY=secret_...
NOTION_DATABASE_ID=<your-database-id>
# Only needed for image generation (Nano Banana / Gemini):
GOOGLE_API_KEY=AIzaSy...
# Only needed for services mode (docker compose):
SERVICE_TOKEN=<any-shared-secret>If you haven't created a Notion database yet, invite your integration to any
Notion database with Name and Tags properties — WhisperForge will
auto-discover it via the integration's search API.
mkdir -p prompts/<YourName>/{knowledge_base,personas,custom_prompts}Drop prompt templates directly into prompts/<YourName>/<type>.md, optional
persona directives into prompts/<YourName>/personas/*.md, and voice/style
docs into prompts/<YourName>/knowledge_base/. Use
prompts/<YourName>/profile.yaml when a profile needs manifest-defined prompt
or persona overrides.
make app
# → http://localhost:8501Override the port with PORT=8502 make app. The target defaults missing API
keys to dummy values so agents can verify the Streamlit shell without touching
real services.
For browser-level localhost verification (Playwright required):
make browser-e2e
make browser-e2e-freshmake browser-e2e-fresh uses recorded fixture payloads from
tests/fixtures/browser_e2e_fresh_run.json so it runs without live provider
credentials.
If Playwright is missing, install it first:
pip install playwright
playwright install chromiummake services-run
# → http://localhost:8501Four containers come up: transcription, processing, storage, frontend.
The frontend's DEPLOY_MODE=services env var tells it to HTTP-call the
backends.
Use make services-smoke for an operations smoke: it builds and starts the
compose stack, waits for container health checks, curls the frontend
/_stcore/health endpoint, and then stops the stack. Services mode requires
Docker and a local .env with SERVICE_TOKEN.
python whisperforge.py path/to/audio.m4a [transcript.txt]You can run the whole transcription + content pipeline on-device, with only Notion requiring network access:
# 1. Pull an LLM via Ollama (llama3 works; larger models write better)
ollama pull llama3
# 2. Enable the local backends
export TRANSCRIPTION_BACKEND=mlx # Apple Silicon
export MLX_WHISPER_MODEL=mlx-community/whisper-medium-mlx # medium recommended
# 3. Run normally
streamlit run app.py
# then in the sidebar: AI Provider → "Ollama (local)" → pick a modelThe installed Ollama models are auto-discovered at sidebar load, so new
ollama pulls appear without restarting. Set
WHISPERFORGE_DISCOVER_OLLAMA=0 to skip local discovery in hermetic tests or
demos. Transcription backends available:
openai (cloud default), mlx (local Apple Silicon via mlx-whisper), and
whisper_cpp (local via the whisper-cli binary; set WHISPER_CPP_MODEL to
a ggml bin path), and whisperx (word timestamps plus optional pyannote
diarization). Provider tradeoffs and candidate integrations are tracked in
docs/TRANSCRIPTION-PROVIDER-MATRIX-2026-05-18.md.
- Pick your user profile in the sidebar. Any directory under
prompts/becomes a profile. - Select provider + model (OpenAI, Anthropic, or Ollama).
- Upload audio, paste text, or paste Wispr Flow dictation. Each submitted input is normalized into a capture record.
- Generate — run individual stages (Extract Wisdom, Create Outline, Social, Image Prompts, Full Article), use "I'm Feeling Lucky" for manual settings, or pick a recipe from the command palette and run it from the current capture/input.
- Save to Notion — builds a single page with color-coded toggles per
section, an AI-generated title, summary callout, tags, run metrics, and
metadata footer. A local markdown export can be written alongside the
Notion save, and each run writes inspectable artifacts under
.cache/runs/.
Use the Output card's Review tab when a draft needs source-grounded judgment before publishing. It summarizes available receipts, transcript/KB excerpts, fact-check flags, revision notes, compare output, and persona variants, plus an advisory scorecard for voice, grounding, usefulness, recipe compliance, and handoff readiness. It is a review surface, not a collaborative editor; edits still happen by rerunning with adjusted prompts, recipes, or settings. The same tab includes an Agent handoff draft preview that writes local issue drafts and can create a GitHub/Linear issue or append to a local follow-up queue after explicit approval. When routing config is missing, approval remains dry-run and shows the blocker. Use the sidebar's Runs dialog to reopen completed run artifacts into this output/review surface and retry downstream exports. Partial runs are listed with their last stage and errors, but arbitrary stage replay is still out of scope.
WHISPER: <AI-generated title>
💜 callout with one-sentence summary
─── divider ───
▶ Chapters (when enabled)
▶ Transcription (default toggle)
▶ Wisdom (brown)
▶ Socials (orange)
▶ Image Prompts (green)
▶ Outline (blue)
▶ Draft Post (purple)
▶ Article · <compare> (when A/B compare ran)
▶ Persona · <name> (one per selected persona)
▶ Scorecard (advisory verdict + dimension scores)
▶ Original Audio (red; only if audio was uploaded)
▶ Source Receipts (transcript/evidence/composition review receipts)
▶ Run metrics (cost, cache, settings, duration)
─── Metadata ───
Original Audio · Created · Models Used · Estimated Tokens
Properties: Name, Tags (multi-select, AI-generated)
The sidebar Runs dialog reads local manifests from .cache/runs/, not only
successful Notion history. It shows status, created/updated timestamps, input
type, recipe/settings, exports, partial-state flags, and errors. Completed runs
with a saved session_output stage can be reopened into the Output/Review
surface; from there, retry safe downstream work such as Markdown export, Notion
save, or agent handoff draft generation. Partial runs are visible for diagnosis,
but arbitrary stage replay and destructive cleanup are intentionally out of
scope.
Four override layers per user, resolved in precedence order:
prompts/<user>/profile.yamlprompt/persona entries — highest priority.prompts/<user>/custom_prompts/<type>.txt— saved by the in-app "Save Prompt" button.prompts/<user>/<type>.md— on-disk template.whisperforge_core.config.DEFAULT_PROMPTS[<type>]— fallback.
Valid <type> values: transcript_cleanup, chapters,
chapters_timestamped, wisdom_extraction, summary, outline_creation,
social_media, image_prompts, article_writing, article_critique,
article_revise, article_fact_check, and seo_analysis.
Knowledge base files at prompts/<user>/knowledge_base/*.{md,txt} are
prepended to the system prompt on every LLM call, so the model writes in your
voice and with your context.
Use the sidebar's KB dialog to inventory the profile's KB files and catch empty, duplicate, stale, oversized, or private-looking docs before they confuse the pipeline. Use Benchmark to compare whole-KB injection vs RAG and inspect the concrete retrieved chunks for a stage/query.
The KB dialog also writes prompts/<user>/knowledge_base/governance.yaml.
Use canonical_files for curated voice anchors that should be prioritized and
labeled in prompts. Use ignored_files for stale, duplicate, private-looking,
or oversized files that should remain auditable but stay out of generation.
Unignored stale/private/duplicate/oversized findings surface as a concise run
warning before generation starts, so future agents should treat them as review
work instead of silently trusting the context.
Persona files at prompts/<user>/personas/*.md appear beside the built-in
persona variants in Generation Settings.
Recipes are small YAML or JSON manifests that describe reusable command
palette workflows. Defaults live in
whisperforge_core/recipe_defaults.yaml; profile-specific recipes can be added
under prompts/<user>/recipes/*.yaml or the recipes: section of
prompts/<user>/profile.yaml. A recipe can declare inputs, stages,
defaults (provider, model, kb_mode, article_length, cleanup,
chapters, agentic, fact_check, images, auto_export_markdown),
output_sections, eval_checks, and handoff_targets.
profile.yaml can also act as a small profile operating-system manifest. These
fields are additive; profiles without them still load normally. Use the KB
dialog's Profile OS summary to inspect effective project context, defaults,
KB packs, privacy notes, handoff targets, and validation warnings.
display_name: KK
project:
name: WhisperForge
context: Source-grounded content and agent handoffs
defaults:
provider: Anthropic
model: claude-haiku-4-5
kb_mode: auto
kb_packs:
voice:
files:
- knowledge_base/voice.md
privacy:
notes: Internal drafts until explicitly exported
handoff_targets:
- github
- linear
- markdownpip install -r requirements-dev.txt
make test # unit tests
make lint # Python syntax + high-signal Ruff checks
make docs-check # docs links, Makefile refs, and freshness fields
make eval-fixture # credential-free editorial/source-receipt fixture
make digest # report-only resurfacing digest from local artifacts
make smoke # boots streamlit, hits /_stcore/health
SMOKE_PORT=8601 make smoke
venv/bin/python tests/ui_smoke.py # renders the Streamlit shell without a browser driver
venv/bin/python scripts/seed_demo_dataset.py # seed article, SongForge, and partial demo runsRun make help for the full operations command list. The Makefile is the
preferred command surface for agents and CI snippets; the underlying commands
remain plain pytest, streamlit, and docker compose.
Run make digest weekly, or after a heavy capture/editing session, to surface
useful signal without routing it anywhere. The digest is local and report-only:
it links back to captures, run manifests, and exports, but by default it does
not email, post, create issues, or schedule follow-ups. Approved local routing
requires both --route-to and --approve-routing; without approval the routing
result stays a visible dry-run. By default it filters known smoke/demo captures
so real session signal stays readable; use
venv/bin/python scripts/resurfacing_digest.py --include-all-captures to
include everything.
- Fixing bugs in the pipeline? Edit
whisperforge_core/— all business logic lives there. The monolith and services both pick up the change. - Tweaking UI or CSS? Edit
ui/for Streamlit controls/layout andstyles.pyfor CSS;app.pyis only the composition root. - Changing Notion layout?
whisperforge_core/notion.py. Preserve the 1900-char block chunker —tests/test_notion.pypins that invariant. - Adding a provider? Add a branch in
whisperforge_core/llm._calland extendLLM_MODELSinconfig.py. - Changing services mode? Update both the FastAPI service model and
whisperforge_core/http_adapters.py, then add a contract test.
| Env var | Purpose | Required |
|---|---|---|
OPENAI_API_KEY |
Whisper + GPT models | yes |
ANTHROPIC_API_KEY |
Claude models | recommended |
NOTION_API_KEY |
Notion integration | yes |
NOTION_DATABASE_ID |
Destination database ID | yes |
DEPLOY_MODE |
direct (default) or services |
no |
SERVICE_TOKEN |
Shared X-API-Key for inter-service calls | services mode |
WHISPERFORGE_LOG_LEVEL |
DEBUG / INFO / WARNING (default INFO) |
no |
WHISPERFORGE_CACHE_DIR |
Cache location (default .cache/) |
no |
WHISPERFORGE_CACHE |
1 to enable the transcription/LLM cache |
no |
WHISPERFORGE_HANDOFF_DRY_RUN |
Force handoff routing dry-run (1/true) |
no |
WHISPERFORGE_HANDOFF_GITHUB_REPO |
Default GitHub repo for approved handoff issue creation (owner/name) |
no |
WHISPERFORGE_HANDOFF_LINEAR_TEAM_ID |
Default Linear team ID for approved handoff issue creation | no |
WHISPERFORGE_HANDOFF_FOLLOWUP_QUEUE_PATH |
Default local JSONL queue path for approved follow-up routing | no |
WHISPERFORGE_HANDOFF_NOTION_DRAFT_DIR |
Local directory for approved Notion task/page draft routing | no |
WHISPERFORGE_E2E_FIXTURE_PATH |
Fixture payload path for browser E2E runs (used by make browser-e2e-fresh) |
no |
WHISPERFORGE_DISCOVER_OLLAMA |
Set 0/false to skip sidebar Ollama model discovery |
no |
WF_RAG |
Force RAG on/off (1/true or 0/false) |
no |
WF_RAG_TOPK |
Retrieved KB chunks per stage (default 5) |
no |
WF_RAG_THRESHOLD |
Auto-RAG chunk threshold (default 25) |
no |
WF_EMBED_MODEL |
Sentence-transformer model for RAG | no |
TRANSCRIPTION_BACKEND |
openai (default) | mlx | whisper_cpp | whisperx |
no |
WHISPERX_MODEL |
faster-whisper model size (tiny|base|small|medium|large-v3; default small) |
no |
WHISPERX_DEVICE |
cpu (default) or cuda |
no |
WHISPERX_DIARIZATION |
1 to label speakers via pyannote |
no |
WHISPERX_HF_TOKEN |
HuggingFace token (required when diarization is on) | diar only |
GOOGLE_API_KEY |
Gemini API key for image generation | images only |
WHISPER_MODEL |
Cloud Whisper model (default gpt-4o-mini-transcribe; also gpt-4o-transcribe or whisper-1) |
no |
MLX_WHISPER_MODEL |
HF repo for mlx-whisper (default whisper-medium-mlx) | no |
WHISPER_CPP_MODEL |
Path to ggml bin for whisper_cpp backend |
whisper_cpp only |
CHUNKER |
size (default) | vad (Silero VAD, cuts on silence) |
no |
OLLAMA_BASE_URL |
Override Ollama endpoint (default http://localhost:11434/v1) |
no |
TRANSCRIPTION_URL / PROCESSING_URL / STORAGE_URL |
Override service URLs (docker compose sets these) | no |
See changelog.md. The 2026-04-19 refactor (0.2.0) collapsed
the previous two parallel implementations into a single shared-logic package,
dropped Grok, fixed requirements.txt, and added the test suite.
See ROADMAP.md for the current stabilization and product
direction. The Linear/GitHub delivery workflow lives in
docs/LINEAR-GITHUB-PIPELINE.md, with the
machine-readable backlog in ops/roadmap/features.json.
Current handoff state lives in STATUS.md.
Presentation/review flow lives in
docs/PRESENTATION-RUNBOOK-2026-05-19.md.
No open-source license is currently granted for this public repository. Until the owner adds an explicit license file, treat the code and documentation as source-available for project collaboration only, not as MIT or otherwise freely redistributable.