CLI tool to transcribe your spoken audio notes into timestamped, multilingual Markdown-—offline, accurate, and feedback-driven.
- Transcribe audio files in various formats (mp3, wav, ogg, etc.).
- Automatically detect the language of the audio.
- Save transcriptions to a database for later retrieval.
- Output transcriptions to a file or the console.
- Transcripts can be saved to date-based Markdown files (e.g.,
YYYY-MM-DD.md). - Each audio file's transcript is organized into timestamped H2-level sections within the output file.
- Transcripts can be saved to date-based Markdown files (e.g.,
- Improve quality of transcriptions based on feedback.
- Accept commands from the Markdown files it generated such as:
- re-transcribe in different language
- learn correction
Install SpeechDown locally for development:
make requirementsThis installs all required dependencies including development tools.
Install SpeechDown via uvx for stable automation:
# Install SpeechDown from GitHub
uvx --from git+https://github.com/dudarev/speechdown sd
# Verify installation
sd --helpTo initialize a SpeechDown project, run:
sd initThis will create a .speechdown directory in the current working directory, which will contain the database and configuration files. The configuration file (config.json) will include a default output_dir setting (transcripts/) where transcription files will be saved.
Note: It is possible to specify a different directory for most commands using the -d or --directory option, if you want to operate in a directory other than the current working directory. See the Options section below for details.
You can configure the output directory for transcripts using the sd config command:
sd config --output-dir path/to/your/transcriptsIf output_dir is not set, or if the path is invalid, SpeechDown will output transcriptions to the standard output. By default, transcripts are saved to a transcripts/ subdirectory within the initialized SpeechDown project directory unless otherwise specified.
SpeechDown supports multiple languages for transcription. You can configure which languages to use with the following commands:
- View current language configuration:
sd config- Set specific languages (replaces existing languages):
sd config --languages en,fr,de- Add a single language to the configuration:
sd config --add-language ja- Remove a language from the configuration:
sd config --remove-language frSpeechDown supports all languages available in the Whisper model, including but not limited to:
- English (en)
- French (fr)
- German (de)
- Spanish (es)
- Chinese (zh)
- Japanese (ja)
- Ukrainian (uk)
- And many more
Using the correct language codes improves transcription accuracy and performance.
To transcribe all audio files in the current directory, run:
sd transcribeThis will transcribe all supported audio files found in the current directory and its subdirectories. Transcripts will be saved to files in the configured output-dir.
--debug: Enable debug mode for more verbose output.--dry-run: Simulate the transcription process without making any changes to the database or file system.--within-hours: Only transcribe files modified within the last N hours.
For most commands, you can specify a directory to operate in using the -d or --directory option, for example:
sd transcribe -d path/to/your/projectIf you do not specify this option, SpeechDown will use the current working directory by default.
See Makefile for development commands.
For information on how tasks and features are tracked and prioritized, see ADR-006
When changes are pushed to GitHub, a CI pipeline runs the make ci command to validate the code.
For local development, it's recommended to run:
make ci-localThis command runs all tests, including additional integration tests that aren't included in the standard CI pipeline.