SpeechDown

CLI tool to transcribe your spoken audio notes into timestamped, multilingual Markdown-—offline, accurate, and feedback-driven.

Features

Transcribe audio files in various formats (mp3, wav, ogg, etc.).
Automatically detect the language of the audio.
Save transcriptions to a database for later retrieval.
Output transcriptions to a file or the console.
- Transcripts can be saved to date-based Markdown files (e.g., YYYY-MM-DD.md).
- Each audio file's transcript is organized into timestamped H2-level sections within the output file.

Future plans

Improve quality of transcriptions based on feedback.
Accept commands from the Markdown files it generated such as:
- re-transcribe in different language
- learn correction

Installation

For Development

Install SpeechDown locally for development:

make requirements

This installs all required dependencies including development tools.

For Production Use

Install SpeechDown via uvx for stable automation:

# Install SpeechDown from GitHub
uvx --from git+https://github.com/dudarev/speechdown sd

# Verify installation
sd --help

Usage

Initialization

To initialize a SpeechDown project, run:

sd init

This will create a .speechdown directory in the current working directory, which will contain the database and configuration files. The configuration file (config.json) will include a default output_dir setting (transcripts/) where transcription files will be saved.

Note: It is possible to specify a different directory for most commands using the -d or --directory option, if you want to operate in a directory other than the current working directory. See the Options section below for details.

Configuration

You can configure the output directory for transcripts using the sd config command:

sd config --output-dir path/to/your/transcripts

If output_dir is not set, or if the path is invalid, SpeechDown will output transcriptions to the standard output. By default, transcripts are saved to a transcripts/ subdirectory within the initialized SpeechDown project directory unless otherwise specified.

Language Configuration

SpeechDown supports multiple languages for transcription. You can configure which languages to use with the following commands:

View current language configuration:

sd config

Set specific languages (replaces existing languages):

sd config --languages en,fr,de

Add a single language to the configuration:

sd config --add-language ja

Remove a language from the configuration:

sd config --remove-language fr

SpeechDown supports all languages available in the Whisper model, including but not limited to:

English (en)
French (fr)
German (de)
Spanish (es)
Chinese (zh)
Japanese (ja)
Ukrainian (uk)
And many more

Using the correct language codes improves transcription accuracy and performance.

Transcription

To transcribe all audio files in the current directory, run:

sd transcribe

This will transcribe all supported audio files found in the current directory and its subdirectories. Transcripts will be saved to files in the configured output-dir.

Options

--debug: Enable debug mode for more verbose output.
--dry-run: Simulate the transcription process without making any changes to the database or file system.
--within-hours: Only transcribe files modified within the last N hours.

Directory Option

For most commands, you can specify a directory to operate in using the -d or --directory option, for example:

sd transcribe -d path/to/your/project

If you do not specify this option, SpeechDown will use the current working directory by default.

Development

See Makefile for development commands.

Task Tracking

For information on how tasks and features are tracked and prioritized, see ADR-006

Continuous Integration

When changes are pushed to GitHub, a CI pipeline runs the make ci command to validate the code.

For local development, it's recommended to run:

make ci-local

This command runs all tests, including additional integration tests that aren't included in the standard CI pipeline.

Name		Name	Last commit message	Last commit date
Latest commit History 143 Commits
.github		.github
docs		docs
graphics		graphics
scripts		scripts
sql		sql
src/speechdown		src/speechdown
tests		tests
.cursorrules		.cursorrules
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
Makefile		Makefile
TODO.md		TODO.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SpeechDown

Features

Future plans

Installation

For Development

For Production Use

Usage

Initialization

Configuration

Language Configuration

Transcription

Options

Directory Option

Development

Task Tracking

Continuous Integration

About

Uh oh!

Releases 5

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SpeechDown

Features

Future plans

Installation

For Development

For Production Use

Usage

Initialization

Configuration

Language Configuration

Transcription

Options

Directory Option

Development

Task Tracking

Continuous Integration

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages