Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 22 additions & 9 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ on:
pull_request:

jobs:
lint-and-test-default:
lint-and-test-portable:
runs-on: ubuntu-latest
steps:
- name: Checkout
Expand All @@ -29,11 +29,11 @@ jobs:
- name: Check formatting
run: cargo fmt --all -- --check

- name: Clippy (default features)
run: cargo clippy --all-targets -- -D warnings
- name: Clippy (CPU-only OSD build)
run: cargo clippy --all-targets --no-default-features --features osd -- -D warnings

- name: Test (default features)
run: cargo test
- name: Test (CPU-only OSD build)
run: cargo test --no-default-features --features osd

feature-checks:
runs-on: ubuntu-latest
Expand All @@ -55,13 +55,26 @@ jobs:
- name: Check no default features
run: cargo check --no-default-features

- name: Check osd feature only
- name: Check CPU-only OSD feature set
run: cargo check --no-default-features --features osd

- name: Verify package (default publish surface)
run: cargo package --locked
- name: Verify package
run: |
if command -v nvcc >/dev/null 2>&1; then
cargo package --locked
else
cargo package --locked --no-verify
fi

- name: Check default feature set (if toolkit available)
run: |
if command -v nvcc >/dev/null 2>&1; then
cargo check
else
echo "CUDA toolkit not available on this runner; skipping default feature check"
fi

- name: Check cuda feature only (if toolkit available)
- name: Check CUDA-only feature set (if toolkit available)
run: |
if command -v nvcc >/dev/null 2>&1; then
cargo check --no-default-features --features cuda
Expand Down
90 changes: 62 additions & 28 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

16 changes: 12 additions & 4 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "whispers"
version = "0.1.0"
version = "0.2.0"
edition = "2024"
rust-version = "1.85"
description = "Speech-to-text dictation tool for Wayland"
Expand All @@ -18,7 +18,7 @@ tokio = { version = "1", features = ["macros", "rt-multi-thread", "signal", "tim
cpal = "0.17"

# Whisper transcription
whisper-rs = "0.15"
whisper-rs = "0.16"
llama-cpp-2 = "0.1.138"

# uinput virtual keyboard for paste keystroke
Expand Down Expand Up @@ -63,11 +63,19 @@ console = "0.16"
wayland-client = { version = "0.31", optional = true }
wayland-protocols = { version = "0.32", features = ["client"], optional = true }
wayland-protocols-wlr = { version = "0.3", features = ["client"], optional = true }
font8x8 = { version = "0.3", optional = true }
fontdue = { version = "0.9", optional = true }

[features]
default = ["osd"]
default = ["cuda", "osd"]
cuda = ["whisper-rs/cuda", "llama-cpp-2/cuda"]
osd = ["dep:wayland-client", "dep:wayland-protocols", "dep:wayland-protocols-wlr"]
osd = [
"dep:wayland-client",
"dep:wayland-protocols",
"dep:wayland-protocols-wlr",
"dep:font8x8",
"dep:fontdue",
]

[[bin]]
name = "whispers"
Expand Down
35 changes: 16 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,14 +24,13 @@ The two invocations communicate via PID file + `SIGUSR1` — no daemon, no IPC s
`whispers` now has three main dictation modes:

- `raw` keeps output close to the direct transcription result and is the default
- `advanced_local` enables the smart rewrite pipeline after transcription; `[rewrite].backend` chooses whether that rewrite runs locally or in the cloud
- `agentic_rewrite` uses the same local/cloud rewrite backends as `advanced_local`, but adds app-aware policy rules, technical glossary guidance, and a stricter conservative acceptance guard
- `rewrite` runs the LLM-based rewrite pipeline after transcription; `[rewrite].backend` chooses whether that rewrite runs locally or in the cloud, and the same config also carries app-aware policy rules, glossary guidance, and correction-policy settings

The older heuristic cleanup path is still available as deprecated `legacy_basic` for existing configs that already use `[cleanup]`.
The local rewrite path is managed by `whispers` itself through an internal helper binary installed alongside the main executable, so there is no separate tool or daemon to install manually.
When a rewrite mode is enabled with `rewrite.backend = "local"`, `whispers` keeps a hidden rewrite worker warm for a short idle window so repeated dictation is much faster without becoming a permanent background daemon.
Managed rewrite models are the default path. If you point `rewrite.model_path` at your own GGUF, it should be a chat-capable model with an embedded template that `llama.cpp` can apply at runtime.
Deterministic personalization rules apply in all modes: dictionary replacements and spoken snippets. Custom rewrite instructions apply to both rewrite modes, and `agentic_rewrite` can additionally load app rules and glossary entries from separate TOML files.
Deterministic personalization rules apply in all modes: dictionary replacements and spoken snippets. Rewrite instructions, app rules, glossary entries, and correction-policy defaults all live under `[rewrite]`. Older `advanced_local` and `agentic_rewrite` mode names are still accepted as deprecated aliases when reading existing configs.
Cloud ASR and cloud rewrite are both optional. Local remains the default.

For file transcription, `whispers transcribe --raw <file>` always prints the plain ASR transcript without any post-processing.
Expand All @@ -52,32 +51,32 @@ For file transcription, `whispers transcribe --raw <file>` always prints the pla
### From crates.io

```sh
# Default install: CPU build with Wayland OSD
# Default install: CUDA build with Wayland OSD
cargo install whispers

# Enable CUDA acceleration explicitly
cargo install whispers --features cuda
# CPU-only build with Wayland OSD
cargo install whispers --no-default-features --features osd

# Build without the OSD overlay
# CPU-only build without the OSD overlay
cargo install whispers --no-default-features
```

### From git

```sh
# Default install: CPU build with Wayland OSD
# Default install: CUDA build with Wayland OSD
cargo install --git https://github.com/OneNoted/whispers

# Enable CUDA acceleration explicitly
cargo install --git https://github.com/OneNoted/whispers --features cuda
# CPU-only build with Wayland OSD
cargo install --git https://github.com/OneNoted/whispers --no-default-features --features osd

# Build without the OSD overlay
# CPU-only build without the OSD overlay
cargo install --git https://github.com/OneNoted/whispers --no-default-features
```

### Setup

Run the interactive setup wizard to download a local ASR model, generate config, and optionally enable local or cloud advanced dictation. Recommended local models are shown first, and experimental backends like Parakeet are called out explicitly before you opt into them:
Run the interactive setup wizard to download a local ASR model, generate config, optionally enable local or cloud advanced dictation, and offer shell completion install for the supported shells it finds on your `PATH`. Recommended local models are shown first, and experimental backends like Parakeet are called out explicitly before you opt into them:

```sh
whispers setup
Expand Down Expand Up @@ -129,7 +128,7 @@ That still remains a single install: `whispers` manages local ASR models, the op

## Shell completions

Print completion scripts to `stdout`:
`whispers setup` can detect supported shells on your `PATH` and install completions for one shell or all detected shells. You can also print completion scripts to `stdout` manually:

```sh
# auto-detect from $SHELL (falls back to parent process name)
Expand Down Expand Up @@ -196,7 +195,7 @@ flash_attn = true # only used when use_gpu=true
idle_timeout_ms = 120000

[postprocess]
mode = "raw" # or "advanced_local" / "agentic_rewrite"; deprecated: "legacy_basic"
mode = "raw" # or "rewrite"; deprecated: "legacy_basic"

[session]
enabled = true
Expand All @@ -220,8 +219,6 @@ timeout_ms = 30000
idle_timeout_ms = 120000
max_output_chars = 1200
max_tokens = 256

[agentic_rewrite]
policy_path = "~/.local/share/whispers/app-rewrite-policy.toml"
glossary_path = "~/.local/share/whispers/technical-glossary.toml"
default_correction_policy = "balanced"
Expand Down Expand Up @@ -250,7 +247,7 @@ start_sound = "" # empty = bundled sound
stop_sound = ""
```

When `advanced_local` or `agentic_rewrite` is enabled, `whispers` also keeps a short-lived local session ledger in the runtime directory so immediate follow-up corrections like `scratch that` can safely replace the most recent dictation entry when focus has not changed. That session behavior is local either way; only the semantic rewrite stage may be cloud-backed.
When `rewrite` is enabled, `whispers` also keeps a short-lived local session ledger in the runtime directory so immediate follow-up corrections like `scratch that` can safely replace the most recent dictation entry when focus has not changed. That session behavior is local either way; only the semantic rewrite stage may be cloud-backed.

## Cloud Modes

Expand Down Expand Up @@ -306,13 +303,13 @@ Custom rewrite models should include a chat template that `llama.cpp` can read f

## Personalization

Dictionary replacements apply deterministically in `raw`, `advanced_local`, and `agentic_rewrite`, with normalization for case and punctuation but no fuzzy matching. In the rewrite modes, dictionary replacements are applied before the rewrite model and again on the final output so exact names and product terms stay stable.
Dictionary replacements apply deterministically in `raw` and `rewrite`, with normalization for case and punctuation but no fuzzy matching. In rewrite mode, dictionary replacements are applied before the rewrite model and again on the final output so exact names and product terms stay stable.

Spoken snippets also work in all modes. By default, saying `insert <snippet name>` expands the configured snippet text verbatim after post-processing finishes, so the rewrite model cannot paraphrase it. Change the trigger phrase with `personalization.snippet_trigger`.

Custom rewrite instructions live in a separate plain-text file referenced by `rewrite.instructions_path`. `whispers` appends that file to the built-in rewrite prompt for both rewrite modes while still enforcing the same final-text-only output contract. The file is optional, and a missing file is ignored.

`agentic_rewrite` also reads layered app rules from `agentic_rewrite.policy_path` and scoped glossary entries from `agentic_rewrite.glossary_path`. `whispers setup` creates commented starter files for both when you choose the agentic mode, and the minimal CRUD commands above are available for path/list/add/remove workflows.
`rewrite` also reads layered app rules from `rewrite.policy_path` and scoped glossary entries from `rewrite.glossary_path`. `whispers setup` creates commented starter files for both when you choose rewrite mode, and the minimal CRUD commands above are available for path/list/add/remove workflows.

## Faster Whisper

Expand Down
Loading