stt-bench

stt-bench is a command-line utility for benchmarking speech-to-text models on English & multilingual datasets with native support for Indic languages.

Results

Installation

uv pip install stt-bench

Prerequisites

You may also need to install FFmpeg (version 6 or newer) for some of the datasets. (Lahaja, Svarah)

Models

Provider	Model	Status
Kalpa Labs	Proprietary models	Coming soon...
OpenAI	gpt-4o-transcribe, whisper-1	✅
Deepgram	Nova-3	✅
Sarvam	Sarika	✅
Gemini	Flash, Pro*	✅

*Gemini-2.5 Pro is not suitable for real-time conversations due to thinking mode enabled by default.

Supported Datasets

Dataset	Languages	Description
IndicVoices	22 Scheduled Languages of Indian Constitution [1]	Large-scale multilingual speech corpus for Indian languages
Lahaja	Hindi	Diverse Hindi dialects spanning across 83 districts and 132 speakers
Svarah	English	Spoken English with diverse dialects of 117 speakers spanning across 65 districts
Fleurs	102 languages including major Indian languages	For Indian languages, English data is translated to specific languages, and spoken by native speakers
Vaani	Covers almost all Indian spoken languages including 22 Indian scheduled languages	Speakers across India are shown an image and asked to impromptu describe the image

[1] 22 Scheduled Languages of Indian Constituion constitute of

Usage

To see all available options run stt-bench --help

1. Configure environment variables

Set the variables required by the models you plan to evaluate:

HF_TOKEN, OPENAI_API_KEY, MENKA_BASE_URL, DEEPGRAM_API_KEY, SARVAM_API_KEY, GEMINI_API_KEY

2. Run inference across datasets

stt-bench run --model gpt-4o-transcribe

This command writes inference outputs to inference/{model}/{dataset} for every dataset included in the run. Each dataset directory contains CSV files named *predictions.csv. To evaluate a subset of datasets, specify them explicitly:

stt-bench run --model gpt-4o-transcribe --eval-datasets Fleurs

3. Evaluate WER and CER metrics

stt-bench evaluate --dir inference/{model}

The evaluation step generates metrics/{model}/{dataset}/evaluation_metrics.csv, summarizing WER and CER per split and providing overall metrics for the dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
results		results
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

stt-bench

Results

Installation

Prerequisites

Models

Supported Datasets

Usage

1. Configure environment variables

2. Run inference across datasets

3. Evaluate WER and CER metrics

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

stt-bench

Results

Installation

Prerequisites

Models

Supported Datasets

Usage

1. Configure environment variables

2. Run inference across datasets

3. Evaluate WER and CER metrics

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages