A small Linux/CUDA utility for turning audio files into verbatim Markdown
transcripts plus timestamped JSON sidecars. Built around whisper.cpp and
ffmpeg. Part of the raibid-labs
ecosystem.
Take an audio file → run it through ffmpeg (resample to 16 kHz mono PCM) →
run it through whisper.cpp (CUDA-accelerated on NVIDIA GPUs) → write two
files to a configurable output directory:
<slug>-<timestamp>.md— verbatim transcript wrapped in YAML frontmatter<slug>-<timestamp>.json— segment-level timestamps for downstream tooling
- Verbatim only. No summarization, no paraphrase, no condensation, no LLM in the path. If you wanted Snipd, this isn't it.
- Single-purpose. Audio in, faithful text out. Annotation, link insertion, search, summarization — those are downstream concerns the user (or another tool) handles.
- Two-file output. Markdown for humans, JSON for tooling. Same base name, same directory.
- Configurable. Output directory, model choice, and whisper.cpp flags
surface via CLI flags and (eventually)
.fsxconfig.
See docs/01-architecture.md for the full
design rationale.
Phase 0 — repository init. No implementation yet. See issues for the work plan.
(Forthcoming once the implementation lands.)
scribe transcribe --input podcast.mp3 --out-dir ~/transcriptsdocs/00-index.md— index and overviewdocs/01-architecture.md— design rationale, the verbatim contract, dual-output design, why no LLM is in the pathdocs/02-roadmap.md— planned input adapters, output formats, deferrals, explicit non-goalsdocs/03-related-tools.md— how Scribe fits with other raibid-labs projects (Scryforge, voice-stuff, Phage, gudpkm n8n)docs/04-cli-reference.md— subcommands, flags, env vars, exit codes, output layoutdocs/05-contributing.md— dev environment setup, code style, PR conventions, how to extend inputs and outputsdocs/06-troubleshooting.md— common failure modes and fixesdocs/07-self-hosted-ci.md— self-hosted GitHub Actions runner for the end-to-end integration test, security model, host hardening
- Scryforge — Fusabi-powered
TUI information rolodex (RSS, email, YouTube, Spotify, Reddit, bookmarks).
Will eventually invoke Scribe as the engine behind a
transcribe-to-vaultaction on podcast and YouTube items. - voice-stuff (legacy, local) — Python push-to-talk dictation prototype.
Shares the underlying
whisper.cppbuild with Scribe. Slated for a Rust rewrite as Murmur (dictation) plus a separate voice-agent project. - Phage — context composition
engine. Pattern reference for Scribe's eventual
.fsxconfig layer (Fusabi-as-config-DSL). - gudpkm n8n stack — the
voice_memoworkflow has a documented TODO to call Scribe for the audio-upload branch of its webhook.
See docs/03-related-tools.md for the full
picture.
Dual-licensed under either of:
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT License (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.